[Python-Dev] Adding the 'path' module (was Re: Some RFE for review) (original) (raw)

Neil Hodgson nyamatongwe at gmail.com
Sat Jul 16 09:30:07 CEST 2005


Martin v. Löwis:

- But then, the wide API gives all results as Unicode. If you want to promote only those entries that need it, it really means that you only want to "demote" those that don't need it. But how can you tell whether an entry needs it? There is no API to find out.

I wrote a patch for os.listdir at http://www.scintilla.org/difft.txt that uses WideCharToMultiByte to check if a wide name can be represented in a particular code page and only uses that representation if it fits. This is good for Windows code pages including ASCII and "mbcs" but since Python's sys.getdefaultencoding() can be something that has no code page equivalent, it would have to try converting using strict mode and interpret failure as leaving the name as unicode.

You could declare that anything with characters >128 needs it, but that would be an incompatible change: If a character >128 in the system code page is in a file name, listdir currently returns it in the system code page. It then would return a Unicode string.

I now quite like returning unicode for anything non-ASCII on Windows as there is no ambiguity in what the result means and there will be no need to change all the system calls to translate from the default encoding. It is a change to the API which can lead to code breaking but it should break with an exception. Assuming that byte string arguments are using Python's default encoding looks more dangerous with a behavioural change but no notification.

Neil



More information about the Python-Dev mailing list