[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)
Guido van Rossum guido at python.org
Mon Dec 8 19:26:46 CET 2008
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
Guido van Rossum wrote:
On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
Toshio Kuratomi wrote:
- If this is true, a definition of os.listdir(<type 'str'>) that would better meet programmer expectation would be: "Give me all files in a directory with the output as str type". The definition of os.listdir(<type 'bytes'>) would be "Give me all files in a directory with the output as bytes type". Raising an exception when the filenames are undecodable is perfectly reasonable in this situation. Your examples (snipped) pretty well convince me that there is a use case for raising exceptions. We should move beyond arguing over which one way is right. I think there should be a second argument 'ignorebad=False' to ignore undecodable files rather than raise the exception (or 'strict=True' to stop and raise exception on non-decodable names -- then code is 'if strict: raise ...'). I believe other functions have a similar parameter. I was thinking of the "normal Unicode 'errors' parameter", as described by Nick. If you want the exceptions, just use the bytes API and try to decode the byte strings using the system encoding. If it was a matter of adding a new method, I might agree. But: 1. We already have a method that does exactly what you describe. It is only a matter of adding flexibility to the response to problems, for which there is already precedent. 2. Suggesting that people who want strings and not bytes should have to deal with bytes, just to get an error notification, seems to negate that point of moving to 3.0 3. A builtin would probably do so better than most programmers would, with little touches such as the one suggested below. 4. An error parameter would ALERT programmers to the possibility of a PROBLEM, both in the present and future. As you say below, people need to better anticipate the future. My problem with raising exceptions by default when an undecodable name exists is that it may render an app completely useless in a situation where the developer is no longer around. This happened all the time with the 2.x Unicode API, where the developer hadn't anticipated a particular input potentially containing non-ASCII bytes, and the user fed the application non-ASCII text. Making os.listdir raise an exception when a directory contains a single undecodable file means that the entire directory can't be read, and most likely the entire app crashes at that point. Most likely the developer never anticipated this situation (since in most places it is either impossible or very unlikely) -- after all, if they had anticipated it they would have used the bytes API in the first place. (It's worse because the exception being raised would be UnicodeError -- most people expect os.listdir to raise OSError, not other errors.) This to be is an argument for keeping the default the current behavior, but not for rejecting flexibility. The computing world seems to be messier than we would like and worse that I realized until this week. As you say below, people need to better anticipate the future, and an errors parameter would help do that.
I'm fine with whatever API enhancements you can come up with (assuming others like them too :-) as long as the default remains the current behavior.
Is Windows really immune? What about when it reads the directory of possibly old removable media with whatever byte name encodings? Is this a possible source of 'unanticipated' problems?
As to your last sentence, os.listdir() with an errors parameter could convert a decoding UnicodeError to "OSError: undecodable file name <ascii+hex repr>", thereby supplying the expected exception as well as an extractable representation of problematical the raw bytes Here is a possible use case: I want filenames as 3.0 strings and I anticipate no problems at present but, as you say above, something might happen years in the future. I am using 3.0 because of the strings == unicode feature. I would like to write try: files = os.listdir(somedir, errors = strict) except OSError as e: log() files = os.listdir(somedir) and go one without the problem file but not without logging the problem so a future maintainer can consider what to do about it, but only when there is an actual need to think about it. Terry Jan Reedy
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]