[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sun Dec 7 22:49:41 CET 2008


Terry Reedy wrote:

Toshio Kuratomi wrote:

- If this is true, a definition of os.listdir(<type 'str'>) that would better meet programmer expectation would be: "Give me all files in a directory with the output as str type". The definition of os.listdir(<type 'bytes'>) would be "Give me all files in a directory with the output as bytes type". Raising an exception when the filenames are undecodable is perfectly reasonable in this situation. Your examples (snipped) pretty well convince me that there is a use case for raising exceptions. We should move beyond arguing over which one way is right. I think there should be a second argument 'ignorebad=False' to ignore undecodable files rather than raise the exception (or 'strict=True' to stop and raise exception on non-decodable names -- then code is 'if strict: raise ...'). I believe other functions have a similar parameter.

If we were going to do anything like that for os.listdir() and other filesystem APIs (like glob) that return multiple paths, we'd probably be best advised to just have a normal Unicode 'errors' parameter which allowed:

'strict' - raise an Exception for malformed binary data 'replace' - insert '?' or some other symbol in place of malformed binary data 'ignore' - simply leave out the malformed binary data 'skip' - run the underlying codec in strict mode, but skip over any items which raise UnicodeDecodeError (default/current Py3k behaviour)

Obviously, 'skip' doesn't make any sense for APIs like getcwd() that return a single value - a case could be made for those defaulting to either replace or strict.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list