msg234809 - (view) |
Author: Andy Reitz (Andy.Reitz) |
Date: 2015-01-27 07:03 |
On Python 2.7.9, if I set an https_proxy environment variable, where the password contains a '/' character, urllib2 fails. Given this test code: import os, urllib os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234" f = urllib.urlopen('http://www.python.org') data = f.read() print data I expect this error message (because my sample proxy is totally bogus): [areitz@SOMEHOST ~]$ python2.7 test.py Traceback (most recent call last): File "test.py", line 3, in f = urllib.urlopen('http://www.python.org') File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/usr/lib64/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/usr/lib64/python2.7/urllib.py", line 350, in open_http h.endheaders(data) File "/usr/lib64/python2.7/httplib.py", line 997, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 850, in _send_output self.send(msg) File "/usr/lib64/python2.7/httplib.py", line 812, in send self.connect() File "/usr/lib64/python2.7/httplib.py", line 793, in connect self.timeout, self.source_address) File "/usr/lib64/python2.7/socket.py", line 571, in create_connection raise err IOError: [Errno socket error] [Errno 101] Network is unreachable Instead, I receive this error: [areitz@SOMEHOST ~]$ python2.7 test.py Traceback (most recent call last): File "test.py", line 3, in f = urllib.urlopen('http://www.python.org') File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/usr/lib64/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/usr/lib64/python2.7/urllib.py", line 339, in open_http h = httplib.HTTP(host) File "/usr/lib64/python2.7/httplib.py", line 1107, in __init__ self._setup(self._connection_class(host, port, strict)) File "/usr/lib64/python2.7/httplib.py", line 712, in __init__ (self.host, self.port) = self._get_hostport(host, port) File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) httplib.InvalidURL: nonnumeric port: 'a' Note that from the error, it seems as if urllib2 is incorrectly parsing the password from the proxy URL. When trying this with curl 7.19.7, I see the proper behavior (the correct password is parsed from the proxy URL). |
|
|
msg234810 - (view) |
Author: Andy Reitz (Andy.Reitz) |
Date: 2015-01-27 07:28 |
Sorry, went a bit too quickly -- here is the sample code that I meant to use: import os, urllib2 os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234" f = urllib2.urlopen('http://www.python.org') data = f.read() print data And the stack trace that I receive: Traceback (most recent call last): File "test.py", line 3, in f = urllib2.urlopen('http://www.python.org') File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/usr/lib64/python2.7/urllib2.py", line 449, in _open '_open', req) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.7/urllib2.py", line 1166, in do_open h = http_class(host, timeout=req.timeout, **http_conn_args) File "/usr/lib64/python2.7/httplib.py", line 712, in __init__ (self.host, self.port) = self._get_hostport(host, port) File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) httplib.InvalidURL: nonnumeric port: 'a' It actually looks the same -- so I suppose this issue affects both urllib and urllib2. |
|
|
msg234823 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2015-01-27 15:22 |
Yup, can confirm that this is problem. As Andy recognized, there is parsing error that fails on '/' character in the password. The environ based proxies are used by urllib rather than urllib2. (The test case if relies on environ proxy, should use urllib.urlopen()), but the failure is coming from parsing done in httplib, so it affects both urllib and urllib2. |
|
|
msg235587 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-02-09 04:38 |
Related: Issue 18140. The slash character is meant to be a reserved character in URLs, so why hasn’t it been encoded? Where does the environment variable come from? |
|
|
msg235588 - (view) |
Author: Andy Reitz (Andy.Reitz) |
Date: 2015-02-09 05:10 |
The proxy credentials are supplied by our sysadmin. My understanding is that the http_proxy env variable doesn't require URI encoding. In addition, the same credentials work fine with curl. |
|
|
msg235590 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-02-09 05:44 |
The relevant code looks like it is _parse_proxy() at Lib/urllib/request.py:693. It has custom code to search for a slash (/), so it wouldn’t be hard to make it search after the last at (@) symbol. (I previously assumed it would use urlsplit() or similar, which would be harder to adjust.) Even Curl seems to require an @ symbol in the username or password to be encoded, i.e. the following doesn’t work, so you still need to encode the fields in general to work with Curl. http_proxy=http://a@x:b@localhost curl . . . http_proxy=http://a:b@x@localhost curl . . . |
|
|
msg235628 - (view) |
Author: Panagiotis Issaris (takis) |
Date: 2015-02-09 20:19 |
RFC3986 seems to state that a '/' character should be encoded: """... reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" ... The user information, if present, is followed by a commercial at-sign ("@") that delimits it from the host. userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) """ |
|
|
msg235631 - (view) |
Author: Andy Reitz (Andy.Reitz) |
Date: 2015-02-09 21:04 |
Sure, but the question is who should do the encoding -- the user, or python? I think it would be better for python to read the password from the environment variable, and encode it before using it. I think this is what users expect. |
|
|
msg235649 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-02-10 00:16 |
To comply with the RFC on URLs, whoever is setting the environment variable _should_ do the encoding, and then Python will _decode_ it. But I suspect this case is more about how Python should handle an environment variable that hasn’t been encoded correctly. |
|
|
msg235666 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2015-02-10 05:16 |
In the initial report, I thought, it was mentioned that curl reads the same http_proxy variable properly. It will be good to have a correct curl test case to ascertain that. But, at all the places, where @ character is allowed in urls (netrc, git configs, I see that @ should be encoded). In that case, this bug report is more towards detecting bad urls and presenting a better error message. |
|
|
msg235673 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-02-10 06:58 |
This should demonstrate that Curl does parse literal slashes in the username and password fields: $ http_proxy=http://user/name:pass/word@localhost:22 curl -v http://example.net/ * Trying ::1... * Connected to localhost (::1) port 22 (#0) * Proxy auth using Basic with user 'user/name' > GET http://example.net/ HTTP/1.1 > Proxy-Authorization: Basic dXNlci9uYW1lOnBhc3Mvd29yZA== > User-Agent: curl/7.40.0 > Host: example.net > Accept: */* > Connection: TE > TE: gzip > Proxy-Connection: Keep-Alive > SSH-2.0-OpenSSH_6.2 Protocol mismatch. * Recv failure: Connection reset by peer * Closing connection 0 curl: (56) Recv failure: Connection reset by peer [Exit 56] $ base64 -d <<< dXNlci9uYW1lOnBhc3Mvd29yZA== user/name:pass/word$ |
|
|
msg383881 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2020-12-28 02:54 |
https://github.com/python/cpython/pull/23973 will resolve this issue. The issue was localized to _parse_proxy method in urllib2. |
|
|
msg383998 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2020-12-29 13:17 |
Merged in 3.10 - https://github.com/python/cpython/commit/030a713183084594659aefd77b76fe30178e23c8 3.9 - https://github.com/python/cpython/commit/df794406a8803e3d6062af8404d7564833f9af28 3.8 - https://github.com/python/cpython/commit/741f22df24ca61db38b5a7a2a58b5939b7154a01 |
|
|