Issue 947571: urllib.urlopen() fails to raise exception (original) (raw)

Issue947571

Created on 2004-05-04 09:57 by lemburg, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg20681 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-05-04 09:57
I've come across a strange problem: even though the docs say that urllib.urlopen() should raise an IOError for server errors (e.g. 404s), all versions of Python that I've tested (1.5.2 - 2.3) fail to do so. Example: >>> import urllib >>> f = urllib.urlopen('http://www.example.net/this-url-does-not-exist/') >>> print f.read() 404 Not Found

Not Found

The requested URL /this-url-does-not-exist/ was not found on this server.


Apache/1.3.27 Server at www.example.com Port 80
Either the docs are wrong or the implementation has a really long standing bug or I am missing something.
msg20682 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2004-06-02 18:29
Logged In: YES user_id=89016 This seems to work with urllib2: >>> import urllib2 >>> f = urllib2.urlopen('http://www.example.net/this-url-does- not-exist/') Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.3/urllib2.py", line 129, in urlopen return _opener.open(url, data) File "/usr/local/lib/python2.3/urllib2.py", line 326, in open '_open', req) File "/usr/local/lib/python2.3/urllib2.py", line 306, in _call_chain result = func(*args) File "/usr/local/lib/python2.3/urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "/usr/local/lib/python2.3/urllib2.py", line 895, in do_open return self.parent.error('http', req, fp, code, msg, hdrs) File "/usr/local/lib/python2.3/urllib2.py", line 352, in error return self._call_chain(*args) File "/usr/local/lib/python2.3/urllib2.py", line 306, in _call_chain result = func(*args) File "/usr/local/lib/python2.3/urllib2.py", line 412, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found
msg20683 - (view) Author: Mike Brown (mike_j_brown) Date: 2004-07-10 18:25
Logged In: YES user_id=371366 In urllib.FancyURLopener, which is the class used by urllib.urlopen(), there is this override of URLopener's http_error_default: def http_error_default(self, url, fp, errcode, errmsg, headers): """Default error handling -- don't raise an exception.""" return addinfourl(fp, headers, "http:" + url) I don't see how this is really all that desirable, but nevertheless it appears to be quite deliberate. It looks like the intent in urlopen is that if you want to use some other opener besides an instance of FancyURLopener, you can set urllib._urlopener. This seems to work: >>> import urllib >>> class MyUrlOpener(urllib.FancyURLopener): ... def http_error_default(*args, **kwargs): ... return urllib.URLopener.http_error_default(*args, **kwargs) ... >>> urllib._urlopener = MyUrlOpener() >>> urllib.urlopen('http://www.example.com/this-url-does- not-exist/') Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.3/urllib.py", line 76, in urlopen return opener.open(url) File "/usr/local/lib/python2.3/urllib.py", line 181, in open return getattr(self, name)(url) File "/usr/local/lib/python2.3/urllib.py", line 306, in open_http return self.http_error(url, fp, errcode, errmsg, headers) File "/usr/local/lib/python2.3/urllib.py", line 323, in http_error return self.http_error_default(url, fp, errcode, errmsg, headers) File "", line 3, in http_error_default File "/usr/local/lib/python2.3/urllib.py", line 329, in http_error_default raise IOError, ('http error', errcode, errmsg, headers) IOError: ('http error', 404, 'Not Found', <httplib.HTTPMessage instance at 0x836298c>)
msg20684 - (view) Author: Mike Brown (mike_j_brown) Date: 2004-07-10 18:39
Logged In: YES user_id=371366 I suppose I could've made that example a little simpler: class ErrorRecognizingURLopener(urllib.FancyURLopener): http_error_default = urllib.URLopener.http_error_default urllib._urlopener = ErrorRecognizingURLopener()
msg20685 - (view) Author: John J Lee (jjlee) Date: 2004-07-10 19:08
Logged In: YES user_id=261020 Seems a mistake to change this now. The current behaviour should be documented, though, of course.
msg20686 - (view) Author: Mike Brown (mike_j_brown) Date: 2004-07-10 19:20
Logged In: YES user_id=371366 I suggest closing as Won't Fix or Not A Bug, but change the documentation for urllib.urlopen() to read: """urlopen(url [, data]) -> open file-like object using urllib._urlopener, which will be an instance of FancyURLopener if not already set.""" The onus is still on the user to notice in the docs that FancyURLopener will ignore HTTP error responses for which it does not have an explicit handler, but at least this way they'll at least be pointed in the right direction.
msg20687 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-02-20 21:26
Logged In: YES user_id=849994 Committed an addition to the docs in rev. 42527, 42528.
History
Date User Action Args
2022-04-11 14:56:04 admin set github: 40216
2004-05-04 09:57:59 lemburg create