msg45002 - (view) |
Author: John J Lee (jjlee) |
Date: 2003-12-03 00:53 |
Here are some unit tests for urllib2 and a revised version of my urllib2 "processors" patch (originally posted as RFE 759792 -- I'm posting it here since it is a patch, not just a wish). The tests depend on the patch, but test more than just the changes introduced by the patch. A fuller discussion is in the original RFE tracker item, but briefly: the patch makes it possible to implement functionality like HTTP cookie handling, Refresh handling, etc. etc. using handler objects. At the moment urllib2's handler objects aren't quite up to the job, which results in a lot of cut-n-paste and subclassing. I believe the changes are backwards-compatible, with the exception of people who've reimplemented build_opener()'s functionality -- those people would need to call opener.add_handler(HTTPErrorProcessor). The main change is allowing handlers to implement methods like: http_request(request) http_response(request, response) In addition to the usual http_open(request) http_error{_*}(...) I call handlers that implement these methods "processors". These methods get called for *every* processor (in contrast to the ordinary handler methods, where the OpenerDirector stops calling the methods as soon as the first handler handles the request by returning a response) to pre-process requests and post-process responses. If this is accepted, I can submit patches for handlers (processors) that do HTTP Refresh redirection, cookie handling etc. Changes in the patch: -OpenerDirector changes to call new _request and _response methods. I haven't put all the documentation for this interface in this set of patches because there's no obvious place for it: handlers aren't really documented either. The urllib2 docs need a cleanup, but I'll do that in a separate patch. -Added .unredirected_hdrs dict to Request, together with .add_unredirected_headers() and .has_header() methods. These headers don't get copied to redirected requests. I didn't add this as a feature for people calling urlopen on a Request. Rather, the motivation comes from the fact that processors need to explicitly add headers to Requests (Cookie, Referer, Content-Length, etc.), rather than directly sending them over the wire. The problem is, if they add them to the regular .headers attribute of requests, processors will end up clobbering headers added by the user who called urlopen (which would break backwards-compatibility). Having processors use a separate set of headers that never get redirected makes this problem go away: users can add headers (with either .add_header() or .add_unredirected_header(), since processors don't clobber either) and know that they won't get clobbered by any handler. -HTTPErrorProcessor is necessary to allow response processors to see responses before redirections &c happen, by moving the call to parent.error() out of AbstractHTTPHandler.do_open(). It has the side-effect of stopping people grumbling that 200 is not the only success code in HTTP <0.5 wink>, since it makes it feasible to override urllib2's behaviour of raising an exception unless the HTTP code == 200. -Split part of AbstractHTTPHandler.do_open (which implements http_open / https_open in the HTTP/HTTPSHandler subclasses) into a new .do_request (which implements http_request in the subclasses). Just because I could, really (with the new *_request methods). It seems clearer to me. -Single string-formatting-character change to OpenerDirector.error() to allow "refresh" as an error code. -Added .code and .msg attributes to HTTP response objects, so that processors can know what the response code and message are. I haven't documented these, because they're HTTP-specific. -Renamed HTTPRedirectHandler.error_302_dict --> .redirect_dict -Finally, there's one bugfix to HTTPRedirectHandler included in the patch, because the tests test for it: multiple visits to a single URL with different redirect codes is no longer erroneously detected as a loop. http://a.com/a --> 302 --> http://a.com/b --> Refresh --> http://a.com/a Yes, I have seen a site where this really happens! There are a few other bugs that turned up while writing the tests, and those tests are commented out ATM. I'll file bug reports for those separately after this one is sorted out. |
|
|
msg45003 - (view) |
Author: John J Lee (jjlee) |
Date: 2003-12-08 00:04 |
Logged In: YES user_id=261020 Oops, I left a couple of lines in urllib2.py that add an HTTP debugging method and constructor arg. Please ignore that part of the patch. |
|
|
msg45004 - (view) |
Author: Jeremy Hylton (jhylton)  |
Date: 2003-12-14 05:28 |
Logged In: YES user_id=31392 Committed as rev 1.57 of urllib2.py |
|
|
msg45005 - (view) |
Author: John J Lee (jjlee) |
Date: 2003-12-14 15:17 |
Logged In: YES user_id=261020 Just a note that my justification of Request.add_unredirected_hdrs above makes no sense. :-/ I mis-remembered my own reasons for adding this. My original comment from 759792 is correct: Some headers (Content-Length, Referer, ...) mustn't be copied to requests for a redirected URL. This requires the addition of a new dict to Request. I've added an add_unredirected_headers method, too. The old implementation just sends these headers directly, but that's not possible if you want to use procesors to implement these things. |
|
|
msg45006 - (view) |
Author: John J Lee (jjlee) |
Date: 2003-12-14 16:11 |
Logged In: YES user_id=261020 Reopening to add a note to Misc/NEWS about the backwards-incompatibility for people who don't use build_opener(). I presume that's the right place for this note? |
|
|
msg45007 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-01-01 04:00 |
Logged In: YES user_id=80475 FWIW, the test for urllib2 is currently failing and needs to be fixed. |
|
|
msg45008 - (view) |
Author: John J Lee (jjlee) |
Date: 2004-01-04 23:51 |
Logged In: YES user_id=261020 Fixed tests in patch 870606. |
|
|
msg45009 - (view) |
Author: Jeff Epler (jepler) |
Date: 2005-02-08 13:09 |
Logged In: YES user_id=2772 The fedora folks claim that this patch has caused "yum", an apt-like tool for RPMs, to break in python 2.4.0. I haven't been able to verify this, and nobody has come up with a simple, standalone script to produce the problem. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=138535 |
|
|
msg45010 - (view) |
Author: John J Lee (jjlee) |
Date: 2005-02-12 03:14 |
Logged In: YES user_id=261020 I added a detailed comment on the yum/urllib2 issue to the bugzilla tracker (as user jjl at pobox dot com). As I note there, though this is strictly a yum bug, if there is a good, safe way of fixing urllib2 here to be more forgiving, I think it should be done. If I think of one when I look at it again, will submit a patch... |
|
|