[Python-Dev] Bytes path support (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sat Aug 23 11:02:06 CEST 2014


Chris Barker writes:

So I write bytes that are encoded one way into a text file that's encoded another way, and expect to be abel to read that later?

No, not you. Crap software does that. Your MUD server. Oleg's favorite web pages with ads, or more likely the ad servers.

Not for me (or many other users) -- terminals are sometimes set with ascii-only encoding,

So? That means you can't handle text files in general, only those restricted to ASCII. That's a completely different issue.

Python3 supports this case very well. But it does indeed make it hard to work with filenames when you don't know the encoding they are in.

No, it doesn't. Reasonably handling "text streams" in unknown, possibly multiple, encodings is just hard. Python 3 has nothing to do with it, and Oleg should know that very well.

It's true that code written in Python 2 to handle these issues needs to be ported to Python 3. Things is, Oleg says "another tool" -- any non-Python-2 tool will need porting of his code too.

And apparently that's pretty common -- or common enough that it would be nice for Python to support it well. This trick is how -- we'd like the "just pass it around and do path manipulations" case to work with (almost) arbitrary bytes,

It does. That's what os.path is for.

but everything else to work naturally with text (unicode text).

No gloss, please. It's text, period. The internal Unicode encoding is not exposed, with a few (important) exceptions such as Han unification.

I think the way to do this is to abstract the path concept, like pathlib does.

You forgot to append the word "well".

From my personal experience, non-ascii filenames are much easier to deal with if I use unicode for filenames everywhere (py2). Somehow, I have yet to be bitten by mixed encoding in filenames.

.gov domain? ASCII-only terminal settings? It's not "somehow", it's that you live a sheltered life.

So will using a surrogate-escape error handling with pathlib make all this just work?

Not answerable until you define "all this" more precisely.

And that's the big problem with Oleg's complaint, too. It's not at all clear what he wants, except that all of his current code should continue to work in Python 3. Just like all of us. The question then is persuading him that it's worth moving to Python 3 despite the effort of porting Python-2-specific code. Maybe he can be persuaded, maybe not. Python 2 is a better than average language.



More information about the Python-Dev mailing list