[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Tue Oct 25 10:22:26 CEST 2011


Le Mardi 25 Octobre 2011 13:20:12 vous avez écrit :

Victor Stinner writes: > I propose to raise Unicode errors if a filename cannot be decoded > on Windows, instead of creating a bogus filenames with questions > marks.

By "bogus" you mean "sometimes (?) invalid and the OS will refuse to use them, causing a later hard-to-diagnose exception", rather than "not what the user thinks he wants", right?

If the ("Unicode") filename cannot be encoded to the ANSI code page, which is usually a small charset (e.g. cp1252 contains 256 code points), Windows replaces unencodable characters by question marks.

Imagine that the code page is ASCII, the ("Unicode") filename "hého.txt" will be encoded to b"h?ho.txt". You can display this string in a dialog, but you cannot open the file to read its content... If you pass the filename to os.listdir(), it is even worse because "?" is interpreted ("?" means any character, it's a pattern to match a filename).

I would like to raise an error on such situation, because currently the user cannot be noticed otherwise. The user may search "?" in the filename, but Windows replaces also unencodable characters by similar glyph (e.g. "é" replaced by "e").

In the "hard errors" case, a hearty +1 (I'm dealing with this in an experimental version of XEmacs and it's a right PITA if the codec doesn't complain).

If you use MultiByteToWideChar and WideCharToMultiByte, you can be noticed on error using some flags, but functions of the ANSI API doesn't give access to these flags...

Backward compatibility is important, but here the costs of fixing such bugs outweigh the value of bug-compatibility.

I only want to change how unencodable filenames are handled, the bytes API will still be available. If you filesystem has the "8dot3name" feature enable, it may work even for unencodable filenames (Windows generates names like HEHO~1.TXT).

Victor



More information about the Python-Dev mailing list