urllib.request.url2pathname() mishandles empty authority sections (mostly) · Issue #126766 · python/cpython (original) (raw)

Bug report

Bug description:

File URIs that start with 3+ slashes should be parsed as having an empty authority section (ref), but urllib.request.url2pathname() incorrectly retains the slashes introducing the authority section. This means it can't properly parse the most common form of POSIX absolute file URIs (e.g. file:///etc/hosts).

On Windows, url2pathname() correctly discards slashes before DOS drives (so file:///c:/foo is parsed as c:\foo), and before old-fashioned UNC URIs (so file:////server/share is parsed as \\server\share), but incorrectly retains slashes if a rooted, driveless path is decoded (so file:///foo/bar is decoded as \\\foo\bar instead of \foo\bar). This is much less of a problem because such paths are rare on Windows.

from urllib.request import url2pathname url2pathname('///etc/hosts') '///etc/hosts' # expected: '/etc/hosts'

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux, Windows

Linked PRs