File:// URIs in Python (original) (raw)
I’d like to tour the standard library’s existing support for file URIs and then make a proposal.
urllib
The urllib.request
module has longstanding support for parsing and generating file URIs with pathname2url() and url2pathname(). The implementation depends on the current platform. To always use Windows semantics, one can import the same functions from the undocumented nturl2path
module. On POSIX, one can call urllib.parse.quote()
and unquote()
. Bugs and discussion:
- #85168 - on POSIX, uses UTF-8 rather than local filesystem encoding
- #90812 - on Windows, incorrectly produces/expects file URIs beginning
file:////
(four slashes), which is incompatible with pathlib’s implementation. - These functions expect you to add and remove the
file://
prefix yourself. The Windows bug mentioned above misleads some folks into thinking they need to add/removefile:
(no slashes). - The Windows variant isn’t documented.
- The operations have much more to do with OS paths than URLs, so
urllib
is arguably the wrong place for them.
pathlib
The pathlib.PurePath
class provides an as_uri() method. Again, the implementation depends on the current platform. The Windows and POSIX variants can be found in PureWindowsPath
and PurePosixPath
. Bugs and discussion:
- #91504 - there’s no way to convert a URI to a path
pathlib
is 90% a high-level wrapper aroundos
,ntpath
andposixpath
. Theas_uri()
method is one of a small handful of exceptions wherepathlib
implements low-level path manipulation logic itself. IMOpathlib
is arguably the wrong place for its implementation.
os.path
(proposal!)
I propose we add two new functions to os.path
that parse and generate file://
URIs. I haven’t found good names for them yet, so here are their working names:
os.path.fileuri()
- returns a file URI from the given path.os.path.fileuriparse()
- returns a path from the given file URI.
Their implementations would live in ntpath
and posixpath
, like most other os.path
functionality.
We can then adjust the previously mentioned modules:
pathlib.PurePath.as_uri()
- remove implementation, call through tofileuri()
pathlib.PurePath.from_uri()
- add this new classmethod, call through tofileuriparse()
urllib.request
- replace usages ofurl2pathname()
withfileuriparse()
urllib.request
- deprecatepathname2url()
andurl2pathname()
nturl2path
- deprecatepathname2url()
andurl2pathname()
(and the entire module?).
I believe this would have the following benefits:
- Improve the experience for users who want to parse and generate
file://
URIs, who usually end up on this SO post with 40k views or one of several others. - Reduce the scope for bugs and incompatibilities in
urllib
andpathlib
by unifying their underlying file URI implementations - Slightly simplify the
urllib
codebase, including letting us deprecate thenturl2path
module. - Slightly simplify the
pathlib
codebase by more consistently delegating low-level tasks toposixpath
andntpath
.
Thanks for reading. What do you think?
guido (Guido van Rossum) May 7, 2022, 10:35pm 2
Aren’t file URLs a bad idea from a security POV?
barneygale (Barney Gale) May 7, 2022, 10:59pm 3
They’re still well-supported in webbrowsers and a few other applications (GNOME and Windows shell use them IIRC). The security considerations are the same as for other URLs I think - if you’re taking untrusted user input, it’s better to include allowed protocols like http://
rather than exclude disallowed protocols like file://
, ftp://
, etc.
guido (Guido van Rossum) May 7, 2022, 11:48pm 4
That’s fair – I had assumed file:
URLs were going out of style, but it seems that was premature.
But then, since file:...
is a URL(*), why is it wrong to have the fundamental support be in urllib
? I’d be amenable to your proposal if you chose to stick it there.
Regarding the naming, I recommend something symmetric, e.g. url_to_path()
and path_to_url()
.
(*) Or a URI? There doesn’t seem to be agreement on what’s what – even standards bodies seem to disagree.
barneygale (Barney Gale) May 8, 2022, 12:40am 5
I suppose file URIs stradle the “URI” vs “file path” divide by their nature. For me, it falls more on the “file path” side of things because the rules vary by OS:
- POSIX uses the local filesystem encoding, but Windows uses UTF-8
- POSIX just prepends the path with
file://
, but Windows additionally removes two leading slashes (for UNC paths) or adds one (for local drive paths), and doesn’t percent-encode colons in drives.
urllib doesn’t otherwise do much per-OS stuff; this is the exception.
On URI vs URL: this w3c document says:
a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network “location”), rather than by some other attributes it may have.
Which makes some sense to me.
Regarding the naming, I recommend something symmetric, e.g.
url_to_path()
andpath_to_url()
.
Thanks, that’s certainly an improvement
pf_moore (Paul Moore) May 8, 2022, 10:56am 6
This is the bit that’s always confused me. Does this mean that it’s not possible to write a file URL in a cross-platform manner, even when it’s possible to write the equivalent path in a cross-platform way? (Paths are typically usable cross-platform as long as you only care about the current drive on Windows - even if semantically, something like /etc
is clearly POSIX-specific).
barneygale (Barney Gale) May 8, 2022, 11:16am 7
I think your analysis is correct, because the trick you’re relying on (omitting the drive letter) makes the path non-absolute, and relative file URIs aren’t supported in RFC 8089.
barneygale (Barney Gale) May 9, 2025, 6:14pm 8
This topic seems to come up in search engine results, so I’d like to give an update. Support for file URIs is much improved in Python 3.14, and some of the changes were backported.
These urllib.request
changes appeared in Python 3.12 and 3.13 bugfix releases:
- #85168 - use filesystem encoding
- #126212 - don’t remove slashes on Windows
- #126205 -
pathname2url()
: generate a URI with two leading slashes (not four) when given a UNC path on Windows - #127217 -
pathname2url()
: generate a URI with four leading slashes (not two) when given a path with two leading slashes on POSIX - #120423 -
pathname2url()
: handle forward slashes like backward slashes on Windows - #126766 -
url2pathname()
: discard empty or ‘localhost’ authority - #127078 -
url2pathname()
: support UNC URI with 5 leading slashes on Windows
And in Python 3.14.0 beta 1 we have:
- #125866 - support complete file URIs if new add_scheme / require_scheme arguments are set to true; preserve DOS drive letter case on Windows
- #127236 -
pathname2url()
: generate a URI with three leading slashes (not one) when given a path with one leading slash - #126601 -
pathname2url()
: don’t raiseOSError
when path contains colon characters not following a drive letter on Windows - #126367 -
url2pathname()
: don’t raiseOSError
when URI contains colon characters not following a drive letter on Windows - #123599 -
url2pathname()
: discard authority if it matches the local hostname, or if it resolves to local IP address and the new resolve_host argument is set to true. If the authority remains unhandled, then return a UNC path on Windows (as before), and raiseURLError
on other platforms.
Demo on Linux (3.14+):
>>> from urllib.request import pathname2url, url2pathname
>>> pathname2url('/etc/hosts', add_scheme=True)
'file:///etc/hosts'
>>> url2pathname('file:///etc/hosts', require_scheme=True)
'/etc/hosts'
I’ve adjusted pathlib’s Path.as_uri()
and from_uri()
to call pathname2url()
and url2pathname()
. We’ve also deprecated the oddball nturl2path
module along the way.
As a result of these changes, I reckon support for generating and parsing file:
URIs in Python is now pretty good! There’s no need for additional functions as proposed in my original post.
Huge thanks to @storchaka and @steve.dower for their expert guidance, reviews, and other contributions, which were essential.