urllib.request.url2pathname() mishandles empty authority sections (mostly) · Issue #126766 · python/cpython (original) (raw)
Bug report
Bug description:
File URIs that start with 3+ slashes should be parsed as having an empty authority section (ref), but urllib.request.url2pathname()
incorrectly retains the slashes introducing the authority section. This means it can't properly parse the most common form of POSIX absolute file URIs (e.g. file:///etc/hosts
).
On Windows, url2pathname()
correctly discards slashes before DOS drives (so file:///c:/foo
is parsed as c:\foo
), and before old-fashioned UNC URIs (so file:////server/share
is parsed as \\server\share
), but incorrectly retains slashes if a rooted, driveless path is decoded (so file:///foo/bar
is decoded as \\\foo\bar
instead of \foo\bar
). This is much less of a problem because such paths are rare on Windows.
from urllib.request import url2pathname url2pathname('///etc/hosts') '///etc/hosts' # expected: '/etc/hosts'
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux, Windows
Linked PRs
- GH-126766: url2pathname(): handle empty authority section. #126767
- [3.13] GH-126766: url2pathname(): handle empty authority section. (GH-126767) #126836
- [3.12] GH-126766: url2pathname(): handle empty authority section. (GH-126767) #126837
- GH-126766: url2pathname(): handle 'localhost' authority #127129
- [3.13] GH-126766: url2pathname(): handle 'localhost' authority (GH-127129) #127130
- [3.12] GH-126766: url2pathname(): handle 'localhost' authority (GH-127129) #127131