[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)
Guido van Rossum guido at python.org
Sat Dec 6 18:00:58 CET 2008
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Dec 5, 2008 at 10:18 PM, Bugbee, Larry <larry.bugbee at boeing.com> wrote:
There has been some discussion here that users should use the str or byte function variant based on what is relevant to their system, for example when getting a list of file names or opening a file. That thought process really doesn't do much for those of us that write code that needs to run on any platform type, without alteration or the addition of complex if-statements and/or exceptions.
Whatever the resolution here, and those of you addressing this thorny issue have my admiration, the solution should be such that it gives consistent behavior regardless of platform type and doesn't require the programmer to know of all the minute details of each possible target platform.
My prediction is that it won't ever be possible to completely hide this difference between platforms. The platforms differ fundamentally in how they see filenames. An elaborate abstraction can certainly be created that smooths out most of the differences, but at some point useful functionality will have to be lost in order to maintain strict platform independence. This is the fate of most platform-independence abstractions by the way. For example, there are many elaborate packages for platform-independent I/O, but they generally don't provide access to all functionality that is available on a platform. Where they do, the application is once again placed in the position of having to use complex if-statements and/or exceptions.
Consider just this example. Many programs have a need to ask their user for a filename to be created by the program. On systems where filenames are raw byte strings, do you want to provide the user with a way to specify an arbitrary byte string? (That is, in addition to the normal case of entering a text string that will be transformed into a filename using some encoding.) Your choices are either not to support the case of bytes that aren't a valid encoding in the current encoding, or add a UI element to select an encoding, or add a UI element to enter raw bytes. An abstraction package is likely to only support the first option (this is what Java does BTW), but this is not acceptable to all applications.
That may not be possible for a while, so interim solutions should be such that it minimizes later pain. If that means hiding "implementation details" behind a new function, so be it. Then, at least, the body of one's app is not burdened with this problem later when conditions change.
I believe the problem's severity is actually overstated. The interim solution with the least amount of pain that will work for almost all apps is to treat filenames as text strings encoded in some default encoding, and ignore filenames that aren't valid encodings of any text string. Yes, it is possible that you'll find that you can't completely remove or traverse certain directory trees. But that's a fact of life anyway (filesystems have many hidden failure modes), so you're better off dealing with that possibility than worrying over the issue of undecodable filenames.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]