[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Tue Oct 25 22🔞13 CEST 2011

Previous message: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API
Next message: [Python-Dev] memcmp performance
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Le mardi 25 octobre 2011 00:57:42, Victor Stinner a écrit :

I propose to raise Unicode errors if a filename cannot be decoded on Windows, instead of creating a bogus filenames with questions marks. Because this change is incompatible with Python 3.2, even if such filenames are unusable and I consider the problem as a (Python?) bug, I would like your opinion on such change before working on a patch.

Most people like the idea, so I wrote a patch and attached it to:

http://bugs.python.org/issue13247

The patch only changes os.getcwdb() and os.listdir().

We might use the PEP 383 to store undecoable bytes as surrogates (U+DC80- U+DCFF). But the situation is the opposite of the situtation on UNIX: on Windows, the problem is more on encoding (text->bytes) than on decoding (bytes->text). On UNIX, problems occur when the system is misconfigured (e.g. wrong locale encoding). On Windows, problems occur when your application uses the old (ANSI) API, whereas your filesystem is fully Unicode compliant and you created Unicode filenames with a program using the new (Windows) API.

I only changed functions returning filenames, so os.mkdir() is unchanged for example.

We may also patch the other functions to simplify the source code.

Victor

Previous message: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API
Next message: [Python-Dev] memcmp performance
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list