As required by RFC 2616 section 3.2.2, for all HTTP requests sent by urllib2, the path component of the URI should be normalized to "/" before the Request-URI derived from it gets passed to httplib (or something functionally equivalent to that). This was fixed in one case in #2464, but the fix is in the wrong place, since it's a general problem not specific to redirects. See the longer discussion here: http://bugs.python.org/msg76736 (hmm, let's see if I can just say and get a hyperlink) Example: import urllib2 urllib2.urlopen("http://python.org?spam") Expect: sends "/?spam" in request line. Got: sends "?spam" in request line. Probably should be fixed by making Request.get_selector() return the normalized URI reference (with the slash always present). When fixing, remember that the Request-URI of RFC 2616 (returned by .get_selector()) is sometimes a relative reference, and sometimes a URI (in RFC 3986's terminology).
Attached is a patch against 3.2 that replaces empty paths with '/' in HTTPConnection. I do not totally understand the ; syntax in URIs, and so this implementation may break that, as it splits urls and unsplits them if needed. The Python docs seem to indicate there might be some obscure cases where this is problematic. And yes, I do realize that this patch fixes the problem in yet another place. Hopefully HTTPConnection is the lowest common denominator.
Fixed it in r86676 (py3k), r86677 ( release31-maint) and r86678(release27-maint). Wes: I fixed it at the much higher level in the urlparse itself, so that the fixed url is sent to the httplib. In , John had pointed out that according to STD 66, path component can legally be empty, so when it is empty this adding of '/' does not take place. Also added tests and NEWS.
This same bug also exists in HTTPClient, and my patch addresses that. Addressing it in HTTPClient has a side effect of taking care of it for urllib2 as well (and all future libraries that use HTTPClient). Even if the urllib2 patch is preferable, shouldn't we fix the problem in HTTPClient as well?
Wes, I forgot to address your last comment. HTTPClient follows HTTP Spec for requests and responses. When it is used, the request is on the PATH and the code there checks if the path does not exist does a request on '/'. It is not appropriate to pass Invalid URLS to httpclient the Invalid url handling and corrections to that are handled at the much higher level. That's why I made those changes in urllib.