[Python-Dev] pathlib - current status of discussions (original) (raw)

Paul Moore p.f.moore at gmail.com
Tue Apr 12 04:31:21 EDT 2016


On 12 April 2016 at 06:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Donald Stufft writes:

> I think yes and yes [fspath and fspath should be allowed to > handle bytes, otherwise] it seems like making it needlessly harder > to deal with a bytes path It's not needless. This kind of polymorphism makes it hard to review code locally. Once bytes get a foothold inside a text application, they metastasize altogether too easily, and you end up with TypeErrors or UnicodeErrors quite far from the origin. Debugging often requires tracing data flows over hill and over dale while choking from the dusty trail, or band-aids like a top-level "except UnicodeError: logandquarantine(bytes)". I can't prove that returning bytes from these APIs is a big risk in this sense, but I can't see a way to prove that it's not, either, given that their point is duck-typing, and therefore they may be generalized in the future, and by third parties. I understand that there are applications where it's bytes all the way down, but by the very nature of computing systems, there are systems where bytes are decoded to text. For historical reasons (the encoding Tower of Babel), it's very error-prone to do that on demand. Best practice is to do the conversion as close to the boundary as possible, and process only text internally. In text applications, "bytes as carcinogen" is an apt metaphor. Now, I'm not Dutch, so I can't tell you it's obvious that the risk to text-processing applications is more important than the inconvenience to byte-shoveling applications. But there is a need to be parsimonious with polymorphism.

As someone who has done a lot of work helping projects to port from the 2.x bytes/text model to the 3.x model, I have similar concerns that rooting out the source of bytes objects appearing in a program could be an issue with the proposed "return either" approach. The most effective tool I have found in fixing programs with text/bytes issues is carefully and thoroughly annotating precisely which functions accept and return bytes, and which accept and return text. The sort of mixed-mode processing we're talking about here makes that substantially harder. And note that the signature of os.fspath can return bytes or text independent of the type of the argument - it's not a "bytes in, bytes out" function like the usual pattern of "polymorphic support for bytes".

But just like Stephen, I have no feel for how significant the risk will be in real life. I've never worked on code that actually has a need for bytestring paths (particularly now that surrogateescape ensures that most cases "just work").

Paul



More information about the Python-Dev mailing list