[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)
Glenn Linderman v+python at g.nevcal.com
Wed Apr 29 23:09:26 CEST 2009
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On approximately 4/29/2009 1:28 PM, came the following characters from the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the str interface, no decoding happens, matches in memory the file on disk with the byte that translates to the same surrogate, accessed via the bytes interface. Ambiguity. What does that mean? What specific interface are you referring to to obtain file names? os.listdir("") os.listdir(b"") So I guess I'd better suggest that a specific, equivalent directory name be passed in either bytes or str form. [Leaving the issue of the empty string apparently having different meanings aside ...] Ok. Now I understand the example. So you do os.listdir("c:/tmp") os.listdir(b"c:/tmp") and you have a file in c:/tmp that is named "abc\uDC10". So what you are saying here is that Python doesn't use the "A" forms of the Windows APIs for filenames, but only the "W" forms, and uses lossy decoding (from MS) to the current code page (which can never be UTF-8 on Windows). Actually, it does use the A form, in the second listdir example. This, in turn (inside Windows), uses the lossy CPACP encoding. You get back a byte string; the listdirs should give ["abc\uDC10"] [b"abc?"] (not quite sure about the second - I only guess that CPACP will replace the half surrogate with a question mark). So where is the ambiguity here?
None. But not everyone can read all the Python source code to try to understand it; they expect the documentation to help them avoid that. Because the documentation is lacking in this area, it makes your concisely stated PEP rather hard to understand.
Thanks for clarifying the Windows behavior, here. A little more clarification in the PEP could have avoided lots of discussion. It would seem that a PEP, proposed to modify a poorly documented (and therefore likely poorly understood) area, should be educational about the status quo, as well as presenting the suggested change. Or is it the Python philosophy that the PEPs should be as incomprehensible as possible, to generate large discussions?
-- Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]