[Python-Dev] a suggestion ... Re: PEP 383 (again) (original) (raw)

Thomas Breuel tmbdev at gmail.com
Thu Apr 30 16:42:45 CEST 2009


What's an analogous failure? Or, rather, why would a failure analogous to the one I got when using System.IO.DirectoryInfo ever exist in Python?

Mono.Unix uses an encoder and a decoder that knows about special quoting rules. System.IO uses a different encoder and decoder because it's a reimplementation of a Microsoft library and the Mono developers chose not to implement Mono.Unix quoting rules in it. There is nothing technical preventing System.IO from using the Mono.Unix codec, it's just that the developers didn't want to change the behavior of an ECMA and Microsoft library.

The analogous phenomenon will exist in Python with PEP 383. Let's say I have a C library with wide character interfaces and I pass it a unicode string from Python.(*) That C library now turns that unicode string into UTF-8 for writing to disk using its internal UTF-8 converter. The result is that the file can be opened using Python's "open", but it can't be opened using the other library. There simply is no way you can guarantee that all libraries turn unicode strings into pathnames using utf-8b. I'm not arguing about whether that's good or bad anymore, since it's obvious that the only proposal acceptable to Guido uses some form of non-standard encoding / quoting.

I'm simply pointing out that the failure you observed with System.IO has nothing to do with which quoting convention you choose, but results from the fact that the developers of System.IO are not using the same encoder/decoder as Mono.Unix (in that case, by choice).

So, I don't see any reason to prefer your half surrogate quoting to the Mono U+0000-based quoting. Both seem to achieve the same goal with respect to round tripping file names, displaying them, etc., but Mono quoting actually results in valid unicode strings. It works because null is the one character that's not legal in a UNIX path name.

So, why do you prefer half surrogate coding to U+0000 quoting?

Tom

(*) There's actually a second, sutble issue. PEP 383 intends utf-8b only to be used for file names. But that means that I might have to bind the first argument to TIFFOpen with utf-8b conversion, while I might have to bind other arguments with utf-8 conversion. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/57f4ecdb/attachment-0001.htm>



More information about the Python-Dev mailing list