[Python-Dev] pathlib - current status of discussions (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Tue Apr 12 04:56:44 EDT 2016


On 12 April 2016 at 15:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Donald Stufft writes:

> I think yes and yes [fspath and fspath should be allowed to > handle bytes, otherwise] it seems like making it needlessly harder > to deal with a bytes path It's not needless. This kind of polymorphism makes it hard to review code locally. Once bytes get a foothold inside a text application, they metastasize altogether too easily, and you end up with TypeErrors or UnicodeErrors quite far from the origin. Debugging often requires tracing data flows over hill and over dale while choking from the dusty trail, or band-aids like a top-level "except UnicodeError: logandquarantine(bytes)". I can't prove that returning bytes from these APIs is a big risk in this sense, but I can't see a way to prove that it's not, either, given that their point is duck-typing, and therefore they may be generalized in the future, and by third parties. I understand that there are applications where it's bytes all the way down, but by the very nature of computing systems, there are systems where bytes are decoded to text. For historical reasons (the encoding Tower of Babel), it's very error-prone to do that on demand. Best practice is to do the conversion as close to the boundary as possible, and process only text internally.

One possible way to address this concern would be to have the underlying protocol be bytes/str (since boundary code frequently needs to handle the paths-are-bytes assumption in POSIX), but offer an "os.fspathname" API that rejected bytes output from os.fspath. That is, it would be equivalent to:

def fspathname(path):
    name = os.fspath(path)
    if not isinstance(name, str):
        raise TypeError("Expected str for pathname, not

{}".format(type(name))) return name

That way folks that wanted the clean "must be str" signature could use os.fspathname, while those that wanted to accept either could use the lower level os.fspath.

The ambiguity in question here is inherent in the differences between the way POSIX and Windows work, so there are limits to how far we can go in hiding it without making things worse rather than better.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list