[Python-Dev] Bytes path support (original) (raw)

Marko Rauhamaa marko at pacujo.net
Thu Aug 21 15:58:03 CEST 2014


"Martin v. Löwis" <martin at v.loewis.de>:

I think the people defending the "Unix file names are just bytes" side often miss an important detail: displaying file names to the user, and allowing the user to enter file names.

The user interface is a real issue and needs to be addressed. It is separate from the OS interface, though.

A script that just needs to traverse a directory tree and look at files by certain criteria can easily do so with not worrying about a text interpretation of the file names.

A single system often has file names that have been encoded with different schemes. Only today, I have had to deal with the JIS character table (<URL: http://i.msdn.microsoft.com/cc305152.932%28en-us,MSDN.10%29.gif>) -- you will notice that it doesn't have a backslash character. A coworker uses ISO-8859-1.

I use UTF-8. UTF-8, of course, will refuse to deal with some byte sequences.

My point is that the poor programmer cannot ignore the possibility of "funny" character sets. If Python tried to protect the programmer from that possibility, the result might be even more intractable: how to act on a file with an non-UTF-8 filename if you are unable to express it as a text string?

Marko



More information about the Python-Dev mailing list