Issue 23328: urllib.request fails for proxy credentials that contain a '/' character (original) (raw)

Issue23328

Created on 2015-01-27 07:03 by Andy.Reitz, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23973 merged orsenthil,2020-12-28 02:53
PR 23992 merged miss-islington,2020-12-29 12:21
PR 23993 merged miss-islington,2020-12-29 12:22
Messages (13)
msg234809 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-01-27 07:03
On Python 2.7.9, if I set an https_proxy environment variable, where the password contains a '/' character, urllib2 fails. Given this test code: import os, urllib os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234" f = urllib.urlopen('http://www.python.org') data = f.read() print data I expect this error message (because my sample proxy is totally bogus): [areitz@SOMEHOST ~]$ python2.7 test.py Traceback (most recent call last): File "test.py", line 3, in f = urllib.urlopen('http://www.python.org') File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/usr/lib64/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/usr/lib64/python2.7/urllib.py", line 350, in open_http h.endheaders(data) File "/usr/lib64/python2.7/httplib.py", line 997, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 850, in _send_output self.send(msg) File "/usr/lib64/python2.7/httplib.py", line 812, in send self.connect() File "/usr/lib64/python2.7/httplib.py", line 793, in connect self.timeout, self.source_address) File "/usr/lib64/python2.7/socket.py", line 571, in create_connection raise err IOError: [Errno socket error] [Errno 101] Network is unreachable Instead, I receive this error: [areitz@SOMEHOST ~]$ python2.7 test.py Traceback (most recent call last): File "test.py", line 3, in f = urllib.urlopen('http://www.python.org') File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/usr/lib64/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/usr/lib64/python2.7/urllib.py", line 339, in open_http h = httplib.HTTP(host) File "/usr/lib64/python2.7/httplib.py", line 1107, in __init__ self._setup(self._connection_class(host, port, strict)) File "/usr/lib64/python2.7/httplib.py", line 712, in __init__ (self.host, self.port) = self._get_hostport(host, port) File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) httplib.InvalidURL: nonnumeric port: 'a' Note that from the error, it seems as if urllib2 is incorrectly parsing the password from the proxy URL. When trying this with curl 7.19.7, I see the proper behavior (the correct password is parsed from the proxy URL).
msg234810 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-01-27 07:28
Sorry, went a bit too quickly -- here is the sample code that I meant to use: import os, urllib2 os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234" f = urllib2.urlopen('http://www.python.org') data = f.read() print data And the stack trace that I receive: Traceback (most recent call last): File "test.py", line 3, in f = urllib2.urlopen('http://www.python.org') File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/usr/lib64/python2.7/urllib2.py", line 449, in _open '_open', req) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.7/urllib2.py", line 1166, in do_open h = http_class(host, timeout=req.timeout, **http_conn_args) File "/usr/lib64/python2.7/httplib.py", line 712, in __init__ (self.host, self.port) = self._get_hostport(host, port) File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) httplib.InvalidURL: nonnumeric port: 'a' It actually looks the same -- so I suppose this issue affects both urllib and urllib2.
msg234823 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2015-01-27 15:22
Yup, can confirm that this is problem. As Andy recognized, there is parsing error that fails on '/' character in the password. The environ based proxies are used by urllib rather than urllib2. (The test case if relies on environ proxy, should use urllib.urlopen()), but the failure is coming from parsing done in httplib, so it affects both urllib and urllib2.
msg235587 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 04:38
Related: Issue 18140. The slash character is meant to be a reserved character in URLs, so why hasn’t it been encoded? Where does the environment variable come from?
msg235588 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-02-09 05:10
The proxy credentials are supplied by our sysadmin. My understanding is that the http_proxy env variable doesn't require URI encoding. In addition, the same credentials work fine with curl.
msg235590 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 05:44
The relevant code looks like it is _parse_proxy() at Lib/urllib/request.py:693. It has custom code to search for a slash (/), so it wouldn’t be hard to make it search after the last at (@) symbol. (I previously assumed it would use urlsplit() or similar, which would be harder to adjust.) Even Curl seems to require an @ symbol in the username or password to be encoded, i.e. the following doesn’t work, so you still need to encode the fields in general to work with Curl. http_proxy=http://a@x:b@localhost curl . . . http_proxy=http://a:b@x@localhost curl . . .
msg235628 - (view) Author: Panagiotis Issaris (takis) Date: 2015-02-09 20:19
RFC3986 seems to state that a '/' character should be encoded: """... reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" ... The user information, if present, is followed by a commercial at-sign ("@") that delimits it from the host. userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) """
msg235631 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-02-09 21:04
Sure, but the question is who should do the encoding -- the user, or python? I think it would be better for python to read the password from the environment variable, and encode it before using it. I think this is what users expect.
msg235649 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-10 00:16
To comply with the RFC on URLs, whoever is setting the environment variable _should_ do the encoding, and then Python will _decode_ it. But I suspect this case is more about how Python should handle an environment variable that hasn’t been encoded correctly.
msg235666 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2015-02-10 05:16
In the initial report, I thought, it was mentioned that curl reads the same http_proxy variable properly. It will be good to have a correct curl test case to ascertain that. But, at all the places, where @ character is allowed in urls (netrc, git configs, I see that @ should be encoded). In that case, this bug report is more towards detecting bad urls and presenting a better error message.
msg235673 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-10 06:58
This should demonstrate that Curl does parse literal slashes in the username and password fields: $ http_proxy=http://user/name:pass/word@localhost:22 curl -v http://example.net/ * Trying ::1... * Connected to localhost (::1) port 22 (#0) * Proxy auth using Basic with user 'user/name' > GET http://example.net/ HTTP/1.1 > Proxy-Authorization: Basic dXNlci9uYW1lOnBhc3Mvd29yZA== > User-Agent: curl/7.40.0 > Host: example.net > Accept: */* > Connection: TE > TE: gzip > Proxy-Connection: Keep-Alive > SSH-2.0-OpenSSH_6.2 Protocol mismatch. * Recv failure: Connection reset by peer * Closing connection 0 curl: (56) Recv failure: Connection reset by peer [Exit 56] $ base64 -d <<< dXNlci9uYW1lOnBhc3Mvd29yZA== user/name:pass/word$
msg383881 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2020-12-28 02:54
https://github.com/python/cpython/pull/23973 will resolve this issue. The issue was localized to _parse_proxy method in urllib2.
msg383998 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2020-12-29 13:17
Merged in 3.10 - https://github.com/python/cpython/commit/030a713183084594659aefd77b76fe30178e23c8 3.9 - https://github.com/python/cpython/commit/df794406a8803e3d6062af8404d7564833f9af28 3.8 - https://github.com/python/cpython/commit/741f22df24ca61db38b5a7a2a58b5939b7154a01
History
Date User Action Args
2022-04-11 14:58:12 admin set github: 67517
2020-12-29 13:17:51 orsenthil set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2020-12-29 12:22:56 orsenthil set versions: + Python 3.8, Python 3.9
2020-12-29 12:22:01 miss-islington set pull_requests: + <pull%5Frequest22835>
2020-12-29 12:21:50 miss-islington set nosy: + miss-islingtonpull_requests: + <pull%5Frequest22834>
2020-12-28 02:54:56 orsenthil set messages: +
2020-12-28 02:53:51 orsenthil set keywords: + patchstage: needs patch -> patch reviewpull_requests: + <pull%5Frequest22818>
2020-12-23 21:06:16 orsenthil set title: urllib2 fails for proxy credentials that contain a '/' character -> urllib.request fails for proxy credentials that contain a '/' characterversions: + Python 3.10, - Python 2.7
2015-03-07 02:36:51 demian.brecht set nosy: + demian.brecht
2015-02-10 06:58:07 martin.panter set messages: +
2015-02-10 05:16:42 orsenthil set messages: +
2015-02-10 00:16:25 martin.panter set messages: +
2015-02-09 21:04:13 Andy.Reitz set messages: +
2015-02-09 20:19:54 takis set nosy: + takismessages: +
2015-02-09 05:44:16 martin.panter set messages: +
2015-02-09 05:10:46 Andy.Reitz set messages: +
2015-02-09 04:38:52 martin.panter set nosy: + martin.pantermessages: +
2015-01-27 15:22:45 orsenthil set nosy: + orsenthilmessages: + assignee: orsenthilstage: needs patch
2015-01-27 07:28:54 Andy.Reitz set messages: +
2015-01-27 07:03:55 Andy.Reitz create