[Python-Dev] str object going in Py3K (original) (raw)
Guido van Rossum guido at python.org
Wed Feb 15 18:25:59 CET 2006
- Previous message: [Python-Dev] str object going in Py3K
- Next message: [Python-Dev] str object going in Py3K
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2/15/06, Fuzzyman <fuzzyman at voidspace.org.uk> wrote:
Forcing the programmer to be aware of encodings, also pushes the same requirement onto the user (who is often the source of the text in question).
The programmer shouldn't have to be aware of encodings most of the time -- it's the job of the I/O library to determine the end user's (as opposed to the language's) default encoding dynamically and act accordingly. Users who use non-ASCII characters without informing the OS of their encoding are in a world of pain, unless they use the OS default encoding (which may vary per locale). If the OS can figure out the default encoding, so can the Python I/O library. Many apps won't have to go beyond this at all.
Note that I don't want to use this OS/user default encoding as the default encoding between bytes and strings; once you are reading bytes you are writing "grown-up" code and you will have to be explicit. It's only the I/O library that should automatically encode on write and decode on read.
Currently you can read a text file and process it - making sure that any changes/requirements only use ascii characters. It therefore doesn't matter what 8 bit ascii-superset encoding is used in the original. If you force the programmer to specify the encoding in order to read the file, they would have to pass that requirement onto their user. Their user is even less likely to be encoding aware than the programmer.
I disagree -- the user most likely has set or received a default encoding when they first got the computer, and that's all they are using. If other tools (notepad, wordpad, emacs, vi etc.) can figure out the encoding, so can Python's I/O library.
What this means, is that for simple programs where the programmer doesn't want to have to worry about encoding, or can't force the user to be aware, they will read in the file as bytes.
Of course not!
Modules will quickly and inevitably be created implementing all the 'string methods' for bytes. New programmers will gravitate to these and the old mess will continue, but with a more awkward hybrid than before. (String manipulations of byte sequences will no longer be a core part of the language - and so be harder to use.)
This seems an unlikely development if we do the conversions in the I/O library.
Not sure what we can do to obviate this of course... but is this change actually going to improve the situation or make it worse ?
I'm not worried about this scenario. "What if all the programmers in the world suddenly became dumb?"
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] str object going in Py3K
- Next message: [Python-Dev] str object going in Py3K
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]