Issue 4493: urllib2 doesn't always supply / where URI path component is empty (original) (raw)

Created on 2008-12-02 20:46 by jjlee, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
empty-path-4493.patch weschow,2010-11-20 22:21
Messages (5)
msg76777 - (view) Author: John J Lee (jjlee) Date: 2008-12-02 20:46
As required by RFC 2616 section 3.2.2, for all HTTP requests sent by urllib2, the path component of the URI should be normalized to "/" before the Request-URI derived from it gets passed to httplib (or something functionally equivalent to that). This was fixed in one case in #2464, but the fix is in the wrong place, since it's a general problem not specific to redirects. See the longer discussion here: http://bugs.python.org/msg76736 (hmm, let's see if I can just say and get a hyperlink) Example: import urllib2 urllib2.urlopen("http://python.org?spam") Expect: sends "/?spam" in request line. Got: sends "?spam" in request line. Probably should be fixed by making Request.get_selector() return the normalized URI reference (with the slash always present). When fixing, remember that the Request-URI of RFC 2616 (returned by .get_selector()) is sometimes a relative reference, and sometimes a URI (in RFC 3986's terminology).
msg121797 - (view) Author: Wes Chow (weschow) Date: 2010-11-20 22:21
Attached is a patch against 3.2 that replaces empty paths with '/' in HTTPConnection. I do not totally understand the ; syntax in URIs, and so this implementation may break that, as it splits urls and unsplits them if needed. The Python docs seem to indicate there might be some obscure cases where this is problematic. And yes, I do realize that this patch fixes the problem in yet another place. Hopefully HTTPConnection is the lowest common denominator.
msg122094 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-11-22 05:06
Fixed it in r86676 (py3k), r86677 ( release31-maint) and r86678(release27-maint). Wes: I fixed it at the much higher level in the urlparse itself, so that the fixed url is sent to the httplib. In , John had pointed out that according to STD 66, path component can legally be empty, so when it is empty this adding of '/' does not take place. Also added tests and NEWS.
msg122121 - (view) Author: Wes Chow (weschow) Date: 2010-11-22 13:18
This same bug also exists in HTTPClient, and my patch addresses that. Addressing it in HTTPClient has a side effect of taking care of it for urllib2 as well (and all future libraries that use HTTPClient). Even if the urllib2 patch is preferable, shouldn't we fix the problem in HTTPClient as well?
msg124185 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-17 05:32
Wes, I forgot to address your last comment. HTTPClient follows HTTP Spec for requests and responses. When it is used, the request is on the PATH and the code there checks if the path does not exist does a request on '/'. It is not appropriate to pass Invalid URLS to httpclient the Invalid url handling and corrections to that are handled at the much higher level. That's why I made those changes in urllib.
History
Date User Action Args
2022-04-11 14:56:42 admin set github: 48743
2010-12-17 05:32:46 orsenthil set nosy:jjlee, orsenthil, dstanek, flox, weschowmessages: +
2010-11-22 13🔞42 weschow set messages: +
2010-11-22 05:06:55 orsenthil set status: open -> closedresolution: fixedmessages: + stage: test needed -> resolved
2010-11-20 22:21:38 weschow set files: + empty-path-4493.patchnosy: + weschowmessages: + keywords: + patch
2010-08-04 07:49:23 flox set nosy: + flox
2010-08-01 19:05:27 dstanek set nosy: + dstanek
2010-07-11 05:37:05 orsenthil set assignee: orsenthil
2010-07-10 16:55:02 BreamoreBoy set versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-04-22 18:47:49 ajaksu2 set priority: normalkeywords: + easy
2009-02-12 19:14:38 ajaksu2 set nosy: + orsenthildependencies: + urllib2 can't handle http://www.wikispaces.comtype: behaviorstage: test neededversions: + Python 2.6
2008-12-02 20:46:11 jjlee create