[Python-Dev] Windows: Remove support of bytes filenames in the os module? (original) (raw)
eryk sun eryksun at gmail.com
Tue Feb 9 08:33:19 EST 2016
- Previous message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in the os module?
- Next message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in the os module?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Feb 9, 2016 at 3:22 AM, Victor Stinner <victor.stinner at gmail.com> wrote:
2016-02-09 1:37 GMT+01:00 eryk sun <eryksun at gmail.com>:
For example, in codepage 932 (Japanese), it's an error if a lead byte (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not uncommon). In this case the ANSI API substitutes the default character for Japanese, '・' (U+30FB, Katakana middle dot).
>>> locale.getpreferredencoding() 'cp932' >>> open(b'\xe05', 'w').close() >>> os.listdir('.') ['・'] >>> os.listdir(b'.') [b'\x81E'] All invalid sequences get mapped to '・', which roundtrips as b'\x81\x45', so you can't reliably create and open files with arbitrary bytes paths in this locale. Oh, and I forgot to ask: what is your filesystem? Is it the same behaviour for NTFS, FAT32, network shared directories, etc.?
That was tested using NTFS, but the same would apply to FAT32, exFAT, and UDF since they all use Unicode 1. CreateFile[A|W] wraps the NtCreateFile system call. The NT executive is Unicode, so the system call receives the filename using a Unicode-only OBJECT_ATTRIBUTES 2 record. I can't say what an arbitrary non-Microsoft filesystem will do with the U+30FB character when it processes the IRP_MJ_CREATE. I was only concerned with ANSI<=>Unicode conversion that's implemented in the ntdll.dll runtime library.
- Previous message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in the os module?
- Next message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in the os module?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]