[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

Guido van Rossum guido at python.org
Fri Dec 5 06:14:39 CET 2008


On Dec 4, 2008, at 6:39 PM, Martin v. Löwis wrote:

I'm in favour of a different, fifth solution:

5) represent all environment variables in Unicode strings, including the ones that currently fail to decode. (then do the same to file names, then drop the byte-oriented file operations again)

On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight <foom at fuhm.net> wrote: [...]

FWIW, I still agree with Martin that that's the most reasonable solution.

On Thu, Dec 4, 2008 at 6:32 PM, Adam Olsen <rhamph at gmail.com> wrote:

It died because nobody presented a viable solution, and I maintain no solution is possible. All suggestions involve arbitrary transformations that fail to round trip correctly at some point or another. They're simply about shuffling the failure around to somewhere the poster happens to like.

Please, if you have a new idea that doesn't have a failure mode, by all means post it. But don't resurrect a pointless bikeshed.

I don't like Martin's solution at all. Glyph's message nails the problem -- the "funny encoding" solution breaks as soon as filenames get passed to other components, and as that's what Python is often all about, it's likely to happen all the time.

The simplest example I can think of is a program that prints a directory listing to stdout -- printing the "funny" encoding to stdout isn't going to be what users expect. So the program has to be aware of the possibility of "funny" encoded filenames, and the roundtripping isn't useful at all.

At the risk of bringing up something that was already rejected, let me propose something that follows the path taken in 3.0 for filenames, rather than doubling back:

For os.environ, os.getenv() and os.putenv(), I think a similar approach as used for os.listdir() and os.getcwd() makes sense: let os.environ skip variables whose name or value is undecodable, and have a separate os.environb() which contains bytes; let os.getenv() and os.putenv() do the right thing when the arguments passed in are bytes.

For sys.argv, because it's positional, you can't skip undecodable values, so I propose to use error=replace for the decoding; again, we can add sys.argvb that contains the raw bytes values. The various os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() and the subprocess module) should all accept bytes as well as strings.

On Windows, the bytes APIs should probably not exist.

I predict that most developers can get away with not using the bytes APIs at all. The small minority that needs to be robust if not all filenames use the system encoding can use the bytes APIs. This would be developers on various Unix systems except OSX (which uses UTF8 for its filesystems), and perhaps the occasional developer on OSX whose app needs to work with files on mounted filesystems that use a different encoding.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list