[Python-Dev] unicodeobject.c,2.139,2.140 checkin (original) (raw)
Jack Jansen Jack.Jansen@oratrix.com
Thu, 25 Apr 2002 23:40:51 +0200
- Previous message: [Python-Dev] unicodeobject.c,2.139,2.140 checkin
- Next message: [Python-Dev] unicodeobject.c,2.139,2.140 checkin
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On donderdag, april 25, 2002, at 08:59 , Guido van Rossum wrote:
I don't know why it is, but Unicode always seems to unnecessarily heat up any discussion involving it. I would really like to know what is causing this: is it a religious issue, does it have to do with the people involved or is Unicode inherently controversial ? [...] Another issue is that adding Unicode was probably the most invasive set of changes ever made to the Python code base. It has complicated many parts of the code, and added at least a proportional share of bugs. (I found 166 source files in CVS containing some variation on the string "unicode", and 110 bug reports mentioning "unicode" in the SF bug tracker.)
Another thing that bothers me is that it retroactively changed the interpretation of other Python objects. For me it's perfectly logical that a character string is a character string, unless there's a very good reason to treat it differently (a framebuffer scanline, a binary blob, etc). And so if I have an API OpenFileWithUnicodeName() that accepts a unicode filename I expect that if I pass an 8-bit filename it would be converted on the fly. Other people focus on different sets of API's, however, and think there's nothing more logical than interpreting the string object as a binary buffer containing UTF16 values or what-have-you.
Scanlines or binary blobs hardly ever mixed with filenames, so there wasn't an issue before unicode raised its pretty/ugly head.
(of course it could be argued that unicode has demonstrated a design flaw in Python, namely that a single data-type was used to store both binary data of unknown interpretation and character arrays, and that there's now little more to be done about that).
- Jack Jansen <Jack.Jansen@oratrix.com>
http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
- Previous message: [Python-Dev] unicodeobject.c,2.139,2.140 checkin
- Next message: [Python-Dev] unicodeobject.c,2.139,2.140 checkin
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]