[Python-Dev] Defining a path protocol (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sun Apr 10 12:29:00 EDT 2016


Ethan Furman writes:

It means the stuff in place won't change, but the stuff we're adding now to integrate with Path will only support str (which is one reason why os.path isn't going to die).

I don't think this is a reason for keeping os.path. (Backward compatibility with existing code is sufficient, of course.) Support of str for all file names is provided by PEP 383. ISTM there's no big loss to using PEP 383's 'surrogateescape' handler to allow un-decode- able filenames in pathlib.Path: they're very rare. AFAIK pathlib doesn't care about surrogates -- after all, they're entirely "consenting adults" stuff. Of course that detracts a bit from the attractiveness of pathlib.Path vs. os.path or bytes methods, but only for a use case most people won't encounter in practice.

We continue to support bytes at the os/io/open level for the same reasons you added formatting back to bytes: there are times when it's as least as natural to work with bytes as str (eg, when the path is passed around without manipulation) and more convenient (eg, you don't have to deal with encodings and UnicodeError handling).

After all, the idea is to make these things work with the stdlib, and the stdlib accepts bytes for path strings.

I don't see a problem. In dealing with legacy data (archives that include paths, such as .zips and .isos) we may find un-decode-able paths, or paths that are decode-able but by undetermined encoding, for a while to come (decades). For those, the bytes interfaces are preferable to unlovely expedients like decoding as 'iso8859-1'. But those are specialized use cases.

Sane people dealing with current file systems won't need bytes in pathlib, and most "out of bounds" uses for pathlib I can think of in my own experience will be able to use surrogateescape.



More information about the Python-Dev mailing list