[Python-Dev] what Windows and Linux really do Re: PEP 383 (again) (original) (raw)
Thomas Breuel tmbdev at gmail.com
Thu Apr 30 09:21:54 CEST 2009
- Previous message: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath
- Next message: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Given the stated rationale of PEP 383, I was wondering what Windows actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file names on a device, plugged them into my Windows Vista machine, and fired up Python 3.0.
First, os.listdir("f:") returns a list of strings for those file names... but those unicode strings are illegal.
You can't even print them without getting an error from Python. In fact, you also can't print strings containing the proposed half-surrogate encodings either: in both cases, the output encoder rejects them with a UnicodeEncodeError. (If not even Python, with its generally lenient attitude, can print those things, some other libraries probably will fail, too.)
What about round tripping? So, if you take a malformed file name from an external device (say, because it was actually encoded iso8859-15 or East Asian) and write it to an NTFS directory, it seems to write malformed UTF-16 file names. In essence, Windows doesn't really use unicode, it just implements 16bit raw character strings, just like UNIX historically implements raw 8bit character strings.
Then I tried the same thing on my Ubuntu 9.04 machine. It turns out that, unlike Windows, Linux is seems to be moving to consistent use of valid UTF-8. If you plug in an external device and nothing else is known about it, it gets mounted with the utf8 option and the kernel actually seems to enforce UTF-8 encoding. I think this calls into question the rationale behind PEP 383, and we should first look into what the roadmap for UNIX/Linux and UTF-8 actually is. UNIX may have consistent unicode support (via UTF-8) before Windows.
As I was saying, I think PEP 383 needs a lot more thought and research...
Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/72448a65/attachment.htm>
- Previous message: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath
- Next message: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]