Issue 947571: urllib.urlopen() fails to raise exception (original) (raw)
Issue947571
Created on 2004-05-04 09:57 by lemburg, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Messages (7) | ||
|---|---|---|
| msg20681 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2004-05-04 09:57 |
I've come across a strange problem: even though the docs say that urllib.urlopen() should raise an IOError for server errors (e.g. 404s), all versions of Python that I've tested (1.5.2 - 2.3) fail to do so. Example: >>> import urllib >>> f = urllib.urlopen('http://www.example.net/this-url-does-not-exist/') >>> print f.read() 404 Not Found Not FoundThe requested URL /this-url-does-not-exist/ was not found on this server.
Apache/1.3.27 Server at www.example.com Port 80 Either the docs are wrong or the implementation has a really long standing bug or I am missing something. |
||
| msg20682 - (view) | Author: Walter Dörwald (doerwalter) * ![]() |
Date: 2004-06-02 18:29 |
| Logged In: YES user_id=89016 This seems to work with urllib2: >>> import urllib2 >>> f = urllib2.urlopen('http://www.example.net/this-url-does- not-exist/') Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.3/urllib2.py", line 129, in urlopen return _opener.open(url, data) File "/usr/local/lib/python2.3/urllib2.py", line 326, in open '_open', req) File "/usr/local/lib/python2.3/urllib2.py", line 306, in _call_chain result = func(*args) File "/usr/local/lib/python2.3/urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "/usr/local/lib/python2.3/urllib2.py", line 895, in do_open return self.parent.error('http', req, fp, code, msg, hdrs) File "/usr/local/lib/python2.3/urllib2.py", line 352, in error return self._call_chain(*args) File "/usr/local/lib/python2.3/urllib2.py", line 306, in _call_chain result = func(*args) File "/usr/local/lib/python2.3/urllib2.py", line 412, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found | ||
| msg20683 - (view) | Author: Mike Brown (mike_j_brown) | Date: 2004-07-10 18:25 |
| Logged In: YES user_id=371366 In urllib.FancyURLopener, which is the class used by urllib.urlopen(), there is this override of URLopener's http_error_default: def http_error_default(self, url, fp, errcode, errmsg, headers): """Default error handling -- don't raise an exception.""" return addinfourl(fp, headers, "http:" + url) I don't see how this is really all that desirable, but nevertheless it appears to be quite deliberate. It looks like the intent in urlopen is that if you want to use some other opener besides an instance of FancyURLopener, you can set urllib._urlopener. This seems to work: >>> import urllib >>> class MyUrlOpener(urllib.FancyURLopener): ... def http_error_default(*args, **kwargs): ... return urllib.URLopener.http_error_default(*args, **kwargs) ... >>> urllib._urlopener = MyUrlOpener() >>> urllib.urlopen('http://www.example.com/this-url-does- not-exist/') Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.3/urllib.py", line 76, in urlopen return opener.open(url) File "/usr/local/lib/python2.3/urllib.py", line 181, in open return getattr(self, name)(url) File "/usr/local/lib/python2.3/urllib.py", line 306, in open_http return self.http_error(url, fp, errcode, errmsg, headers) File "/usr/local/lib/python2.3/urllib.py", line 323, in http_error return self.http_error_default(url, fp, errcode, errmsg, headers) File "", line 3, in http_error_default File "/usr/local/lib/python2.3/urllib.py", line 329, in http_error_default raise IOError, ('http error', errcode, errmsg, headers) IOError: ('http error', 404, 'Not Found', <httplib.HTTPMessage instance at 0x836298c>) | ||
| msg20684 - (view) | Author: Mike Brown (mike_j_brown) | Date: 2004-07-10 18:39 |
| Logged In: YES user_id=371366 I suppose I could've made that example a little simpler: class ErrorRecognizingURLopener(urllib.FancyURLopener): http_error_default = urllib.URLopener.http_error_default urllib._urlopener = ErrorRecognizingURLopener() | ||
| msg20685 - (view) | Author: John J Lee (jjlee) | Date: 2004-07-10 19:08 |
| Logged In: YES user_id=261020 Seems a mistake to change this now. The current behaviour should be documented, though, of course. | ||
| msg20686 - (view) | Author: Mike Brown (mike_j_brown) | Date: 2004-07-10 19:20 |
| Logged In: YES user_id=371366 I suggest closing as Won't Fix or Not A Bug, but change the documentation for urllib.urlopen() to read: """urlopen(url [, data]) -> open file-like object using urllib._urlopener, which will be an instance of FancyURLopener if not already set.""" The onus is still on the user to notice in the docs that FancyURLopener will ignore HTTP error responses for which it does not have an explicit handler, but at least this way they'll at least be pointed in the right direction. | ||
| msg20687 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2006-02-20 21:26 |
| Logged In: YES user_id=849994 Committed an addition to the docs in rev. 42527, 42528. |
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:04 | admin | set | github: 40216 |
| 2004-05-04 09:57:59 | lemburg | create |
