Issue 5072: urllib.open sends full URL after GET command instead of local path (original) (raw)

Issue5072

Created on 2009-01-26 19:22 by olemis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (9)
msg80586 - (view) Author: Olemis Lang (olemis) Date: 2009-01-26 19:22
Hello ... The first thing I have to say is that I searched the open issues and I found nothing similar to what I am going to report hereinafter. If this ticket is duplicate , I apologize ... Yesterday I was testing how to access the wiki pages in a Trac [1]_ site and I realized that something wrong was happening (a bug? ...) Initially the behavior was as follows : {{{ #!python >>> u = urllib.urlopen('http://localhost:8000/trac-dev') >>> u.read() 'Environment not found' >>> u.close() }}} And tracd reported a line like this {{{ 127.0.0.1 - - [25/Jan/2009 17:32:08] "GET http://localhost:8000/trac- dev HTTP/1.0" 404 - }}} Which means that a 'Not found' error code was sent back to urllib client. I tried to access the same page from my browser and tracd reported {{{ 127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 - }}} The problem is obvious ... urllib was sending the full URL after GET and it should send only the string after the network location. I applied the following patch to urllib (yours will be better, I am sure about that ;) {{{ #!diff --- /usr/lib/python2.5/urllib.py 2008-07-31 13:40:40.000000000 -0500 +++ /media/urllib_unix.py 2009-01-26 09:48:54.000000000 -0500 @@ -270,6 +270,7 @@ def open_http(self, url, data=None): """Use HTTP protocol.""" import httplib + from urlparse import urlparse user_passwd = None proxy_passwd= None if isinstance(url, str): @@ -312,12 +313,17 @@ else: auth = None h = httplib.HTTP(host) + target = ''.join(sep + part for sep, part in \ + zip(['', ';', '?', '#'], \ + urlparse(selector)[2:]) \ + if part) + print target if data is not None: - h.putrequest('POST', selector) + h.putrequest('POST', target) h.putheader('Content-Type', 'application/x-www-form- urlencoded') h.putheader('Content-Length', '%d' % len(data)) else: - h.putrequest('GET', selector) + h.putrequest('GET', target) if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' % proxy_auth) if auth: h.putheader('Authorization', 'Basic %s' % auth) if realhost: h.putheader('Host', realhost) }}} And everithing was «back» to normal ... {{{ #!python >>> u = urllib.urlopen('http://localhost:8000/trac-dev') >>> u.read() ... # Lots of beautiful HTML code ;) >>> u.close() }}} ... tracd outputted ... {{{ 127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 - }}} The same picture is shown when using both Python 2.5.1 and 2.5.2 ... I have not installed Python 2.6.x so I am not sure about whether this issue has propagated onto newer versions of Python ... and I don't know euther if this issue is also present in urllib2 or not ... ... so further research is needed, but IMO this is a serious bug :( PD: If this is a bug ... how could it be hidden so far ? Is there any test case written to assert this kind of things ? I checked out `test.test_urllib` and `test.test_urllibnet` modules and I saw nothing at all ... .. [1] Trac (http://trac.edgewall.org)
msg80588 - (view) Author: Olemis Lang (olemis) Date: 2009-01-26 19:28
Ooops ... sorry, remove the print statement. The patch is as follows : {{{ #!diff --- /usr/lib/python2.5/urllib.py 2008-07-31 13:40:40.000000000 -0500 +++ /media/urllib_unix.py 2009-01-26 09:48:54.000000000 -0500 @@ -270,6 +270,7 @@ def open_http(self, url, data=None): """Use HTTP protocol.""" import httplib + from urlparse import urlparse user_passwd = None proxy_passwd= None if isinstance(url, str): @@ -312,12 +313,17 @@ else: auth = None h = httplib.HTTP(host) + target = ''.join(sep + part for sep, part in \ + zip(['', ';', '?', '#'], \ + urlparse(selector)[2:]) \ + if part) if data is not None: - h.putrequest('POST', selector) + h.putrequest('POST', target) h.putheader('Content-Type', 'application/x-www-form- urlencoded') h.putheader('Content-Length', '%d' % len(data)) else: - h.putrequest('GET', selector) + h.putrequest('GET', target) if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' % proxy_auth) if auth: h.putheader('Authorization', 'Basic %s' % auth) if realhost: h.putheader('Host', realhost) }}} I apologize once again ...
msg80600 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-27 00:09
I could not reproduce this issue neither with Python 2.6 nor 2.5.2 If I print host and selector near line 313, I get 'localhost:8000' and '/trac-dev', the expected results. Do you have an HTTP proxy? running at the *same* port? (!)
msg80651 - (view) Author: Olemis Lang (olemis) Date: 2009-01-27 14:02
Actually I am using a proxy hosted in some other machine (i.e. not my PC ... sorry, I didnt mention :S ...) I «debugged» urllib and, when branching at this point (see below ;) in URLopener.open_http : {{{ #!python # urllib,py def open_http(self, url, data=None): """Use HTTP protocol.""" import httplib user_passwd = None proxy_passwd= None if isinstance(url, str): # Branching here !!!!!!!!!! host, selector = splithost(url) if host: user_passwd, host = splituser(host) host = unquote(host) realhost = host else: host, selector = url }}} url variable is bound to the following binary tuple {{{ #!python ('172.18.2.7:3128', 'http://localhost:8000/trac-dev') }}} My IP is 172.18.2.99 ... so the `else` branch is the one being executed If you need further details ... dont hesitate and ask anything you want ;) PD: What d'u mean when you said? > Do you have an HTTP proxy? running at the *same* port? (!) I dont understand this since *I already said* that *I accessed* my Trac environment using my web browser (Opera 9.63, I dont know whether this is relevant at all ... ), *I sent you* the lines outputted by tracd to stdout (or stderr ... I am not very sure right now ... ;) and *I told you* that, once I applied the path *I submitted*, everything was *back to normal* ... I dont understand how could all this be possible if I were running tracd and an HTTP proxy in the *same* port, or even in case `http_proxy` envvar be set to the hostname + port where my Trac instance is listening for incoming connections ... Anyway ... CMIIW ... I also checked that immediately before executing the following statements ... {{{ #!python # urllib,py h = httplib.HTTP(host) if data is not None: h.putrequest('POST', selector) h.putheader('Content-Type', 'application/x-www-form- urlencoded') h.putheader('Content-Length', '%d' % len(data)) else: h.putrequest('GET', selector) }}} ... `selector` is bound to 'http://localhost:8000/trac-dev' ... BTW the `else` clause *is the one executed* in this case, and this is consistent with tracd reports *I sent before* and is logical since `data` arg *is missing* in the code snippet I sent before.
msg80653 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-27 14:37
I suppose 172.18.2.7:3128 is the address:port of the your proxy, right? In which case, urllib seems to do the right thing. When talking to an HTTP proxy, requests are of the form "GET http://site.com/path", rather than "GET /path". It's up to the proxy to strip the host part of the URL when forwarding the request to the target server. (but I suppose tracd could also be more permissive and allow the "GET http://site.com/path" variant. It seems Apache does)
msg80654 - (view) Author: Olemis Lang (olemis) Date: 2009-01-27 15:11
> Quoting Antoine Pitrou ... > I suppose 172.18.2.7:3128 is the address:port of the your proxy, right? Yes ... > In which case, urllib seems to do the right thing. When talking to an HTTP proxy, requests are of the form "GET http://site.com/path", rather than "GET /path". It's up to the proxy to strip the host part of the URL when forwarding the request to the target server. This being said ... > (but I suppose tracd could also be more permissive and allow the "GET http://site.com/path" variant. It seems Apache does) ... It works with Apache (I am talking about trac once again ...) therefore I will report this issue to Trac devs instead ... Thnx a lot ! Sorry if I caused you any trouble ...
msg80683 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-28 00:38
> > Do you have an HTTP proxy? running at the *same* port? > (!) > > I dont understand this since *I already said* that *I > accessed* my Trac > environment using my web browser (Opera 9.63, I dont know > whether this > is relevant at all ... ), *I sent you* the lines outputted > by tracd to > stdout (or stderr ... I am not very sure right now ... ;) > and *I told > you* that, once I applied the path *I submitted*, > everything was *back > to normal* ... If you had configured a proxy at localhost:8000, and *also* a Trac instance at that port, and Trac had "won the race" for the port, then you would observe exactly the symthoms you describe. That is, urllib talking to 8000 as it were a proxy, and the Trac instance actually there getting confused. Your patch, as you surely understand now, is not correct; in fact, the code is OK as it is. urllib builds the request in that specific way *because* he thinks there is a proxy. If the proxy is buggy, misconfigured, or inexistent, it's not the library's fault :) -- Gabriel Genellina > > I dont understand how could all this be possible if I were > running > tracd and an HTTP proxy in the *same* port, or even in case > > `http_proxy` envvar be set to the hostname + port where my > Trac > instance is listening for incoming connections ... > > Anyway ... CMIIW ... > > I also checked that immediately before executing the > following > statements ... > > {{{ > #!python > > # urllib,py > > h = httplib.HTTP(host) > if data is not None: > h.putrequest('POST', selector) > h.putheader('Content-Type', > 'application/x-www-form- > urlencoded') > h.putheader('Content-Length', > '%d' % len(data)) > else: > h.putrequest('GET', selector) > > }}} > > ... `selector` is bound to > 'http://localhost:8000/trac-dev' ... BTW the > `else` clause *is the one executed* in this case, and this > is > consistent with tracd reports *I sent before* and is > logical since > `data` arg *is missing* in the code snippet I sent before. > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue5072> > _______________________________________ Yahoo! Cocina Recetas prácticas y comida saludable http://ar.mujer.yahoo.com/cocina/
msg81798 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-02-12 18:37
Anyone against closing this as "works for me"?
msg82402 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-02-18 01:57
Yup, This should be closed too. Thanks.
History
Date User Action Args
2022-04-11 14:56:44 admin set github: 49322
2009-02-18 14:38:21 ajaksu2 set status: pending -> closed
2009-02-18 01:57:32 orsenthil set messages: +
2009-02-18 01:52:33 ajaksu2 set status: open -> pendingpriority: low
2009-02-12 18:37:06 ajaksu2 set keywords: + patchnosy: + ajaksu2, orsenthilstage: test neededmessages: + versions: + Python 2.6, - Python 2.5
2009-01-28 00:38:41 ggenellina set messages: +
2009-01-27 15:11:52 olemis set messages: +
2009-01-27 14:37:56 pitrou set nosy: + pitroumessages: +
2009-01-27 14:02:43 olemis set messages: +
2009-01-27 00:09:17 ggenellina set nosy: + ggenellinamessages: +
2009-01-26 19:28:43 olemis set messages: +
2009-01-26 19:22:53 olemis create