[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Sep 30 22:45:55 CEST 2008


I'm not sure either way. I've heard it claim that Windows filesystem APIs use Unicode natively. Does Python 3.0 on Windows currently support filenames expressed as bytes?

Yes, it does (at least, os.open, os.stat support them, builtin open doesn't).

Are they encoded first before passing to the Unicode APIs? Using what encoding?

They aren't passed to the Unicode (W) APIs (by Python). Instead, they are passed to the "ANSI" (A) APIs (i.e. CP_ACP APIs). On Windows NT+, that API then converts it to Unicode through the CP_ACP (aka "mbcs") encoding; this is inside the system DLLs.

CP_ACP is a lossy encoding (from Unicode to bytes): Microsoft uses replacement characters if they can, starting with similarly-looking characters, and falling back to question marks.

Regards, Martin



More information about the Python-Dev mailing list