Issue 33342: urllib IPv6 parsing fails with special characters in passwords (original) (raw)

Issue33342

Created on 2018-04-23 13:44 by benaryorg, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg315668 - (view) Author: benaryorg (benaryorg) Date: 2018-04-23 13:44
The documentation specifies to follow RFC 2396 (https://tools.ietf.org/html/rfc2396.html) but fails to parse a user:password@host url in urllib.parse.urlsplit (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) when the password contains an '[' character. This is because the urlsplit code does not strip the authority part (everything from index 0 up to and including the last '@') before checking whether the hostname contains '[' for detecting whether it's an IPv6 address (https://github.com/python/cpython/blob/8a6f4b4bba950fb8eead1b176c58202d773f2f70/Lib/urllib/parse.py#L416-L418).
msg317119 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-05-19 13:49
I presume this is about parsing a URL like >>> urlsplit("//user:[@host") Traceback (most recent call last): File "", line 1, in File "/home/proj/python/cpython/Lib/urllib/parse.py", line 431, in urlsplit raise ValueError("Invalid IPv6 URL") ValueError: Invalid IPv6 URL Ideally the square bracket should be escaped as %5B. Related reports about parsing unescaped delimiters in a URL password are Issue 18140 (fragment #, query ?) and Issue 23328 (slash /).
msg327239 - (view) Author: Thomas Jollans (tjollans) Date: 2018-10-06 09:43
RFC 2396 explicitly excludes the use of [ and ] in URLs. RFC 2732 <https://www.ietf.org/rfc/rfc2732.txt> defines the syntax for IPv6 URLs, and allows [ and ] ONLY in the host part. So I'd say that the behaviour is arguably correct (if somewhat unfortunate)
msg334273 - (view) Author: Terrence Brannon (metaperl) Date: 2019-01-23 21:37
I would like to add to this bug - the password field on the URL cannot contain a pound sign or question mark or the parser incorrectly parses the URL, as this gist demonstrates - https://gist.github.com/metaperl/fc6f43bf6b9a9f874b8f27e29695e68c
msg334302 - (view) Author: Terrence Brannon (metaperl) Date: 2019-01-24 15:55
Also note, if SQLAlchemy gives any guidance, then note that SA unquotes both the username and password of the URL: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/url.py#L274
msg334303 - (view) Author: Terrence Brannon (metaperl) Date: 2019-01-24 15:59
Regarding "RFC 2396 explicitly excludes the use of [ and ] in URLs. RFC 2732 <https://www.ietf.org/rfc/rfc2732.txt> defines the syntax for IPv6 URLs, and allows [ and ] ONLY in the host part. So I'd say that the behaviour is arguably correct (if somewhat unfortunate)" I would say that a square bracket CAN be used in the password, but that it should be urlencoded and that this library should perform a urldecode for both username and password, just as SQLAlchemy does.
msg354745 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-15 17:08
I modified my PR 16780 to also fix this issue, my PR was written for bpo-36338.
History
Date User Action Args
2022-04-11 14:58:59 admin set github: 77523
2019-10-15 17:08:55 vstinner set messages: +
2019-10-15 16:24:08 xtreak set nosy: + vstinner
2019-01-24 15:59:49 metaperl set messages: +
2019-01-24 15:55:55 metaperl set messages: +
2019-01-23 21:37:03 metaperl set nosy: + metaperlmessages: +
2018-10-06 09:43:45 tjollans set nosy: + tjollansmessages: +
2018-05-19 13:49:36 martin.panter set nosy: + martin.pantermessages: +
2018-04-23 13:44:45 benaryorg set type: behavior
2018-04-23 13:44:30 benaryorg create