File:// URIs in Python (original) (raw)

I’d like to tour the standard library’s existing support for file URIs and then make a proposal.

urllib

The urllib.request module has longstanding support for parsing and generating file URIs with pathname2url() and url2pathname(). The implementation depends on the current platform. To always use Windows semantics, one can import the same functions from the undocumented nturl2path module. On POSIX, one can call urllib.parse.quote() and unquote(). Bugs and discussion:

pathlib

The pathlib.PurePath class provides an as_uri() method. Again, the implementation depends on the current platform. The Windows and POSIX variants can be found in PureWindowsPath and PurePosixPath. Bugs and discussion:

os.path (proposal!)

I propose we add two new functions to os.path that parse and generate file:// URIs. I haven’t found good names for them yet, so here are their working names:

Their implementations would live in ntpath and posixpath, like most other os.path functionality.

We can then adjust the previously mentioned modules:

I believe this would have the following benefits:

Thanks for reading. What do you think?

guido (Guido van Rossum) May 7, 2022, 10:35pm 2

Aren’t file URLs a bad idea from a security POV?

barneygale (Barney Gale) May 7, 2022, 10:59pm 3

They’re still well-supported in webbrowsers and a few other applications (GNOME and Windows shell use them IIRC). The security considerations are the same as for other URLs I think - if you’re taking untrusted user input, it’s better to include allowed protocols like http:// rather than exclude disallowed protocols like file://, ftp://, etc.

guido (Guido van Rossum) May 7, 2022, 11:48pm 4

That’s fair – I had assumed file: URLs were going out of style, but it seems that was premature.

But then, since file:... is a URL(*), why is it wrong to have the fundamental support be in urllib? I’d be amenable to your proposal if you chose to stick it there.

Regarding the naming, I recommend something symmetric, e.g. url_to_path() and path_to_url().


(*) Or a URI? There doesn’t seem to be agreement on what’s what – even standards bodies seem to disagree.

barneygale (Barney Gale) May 8, 2022, 12:40am 5

I suppose file URIs stradle the “URI” vs “file path” divide by their nature. For me, it falls more on the “file path” side of things because the rules vary by OS:

urllib doesn’t otherwise do much per-OS stuff; this is the exception.

On URI vs URL: this w3c document says:

a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network “location”), rather than by some other attributes it may have.

Which makes some sense to me.

Regarding the naming, I recommend something symmetric, e.g. url_to_path() and path_to_url() .

Thanks, that’s certainly an improvement :slight_smile:

pf_moore (Paul Moore) May 8, 2022, 10:56am 6

This is the bit that’s always confused me. Does this mean that it’s not possible to write a file URL in a cross-platform manner, even when it’s possible to write the equivalent path in a cross-platform way? (Paths are typically usable cross-platform as long as you only care about the current drive on Windows - even if semantically, something like /etc is clearly POSIX-specific).

barneygale (Barney Gale) May 8, 2022, 11:16am 7

I think your analysis is correct, because the trick you’re relying on (omitting the drive letter) makes the path non-absolute, and relative file URIs aren’t supported in RFC 8089.

barneygale (Barney Gale) May 9, 2025, 6:14pm 8

This topic seems to come up in search engine results, so I’d like to give an update. Support for file URIs is much improved in Python 3.14, and some of the changes were backported.

These urllib.request changes appeared in Python 3.12 and 3.13 bugfix releases:

And in Python 3.14.0 beta 1 we have:

Demo on Linux (3.14+):

>>> from urllib.request import pathname2url, url2pathname
>>> pathname2url('/etc/hosts', add_scheme=True)
'file:///etc/hosts'
>>> url2pathname('file:///etc/hosts', require_scheme=True)
'/etc/hosts'

I’ve adjusted pathlib’s Path.as_uri() and from_uri() to call pathname2url() and url2pathname(). We’ve also deprecated the oddball nturl2path module along the way.

As a result of these changes, I reckon support for generating and parsing file: URIs in Python is now pretty good! There’s no need for additional functions as proposed in my original post.

Huge thanks to @storchaka and @steve.dower for their expert guidance, reviews, and other contributions, which were essential.