[Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) (original) (raw)
François Pinard pinard at iro.umontreal.ca
Tue Sep 14 21:15:28 CEST 2004
- Previous message: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)
- Next message: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[alloydflanagan at comcast.net]
[François Pinard]
>>Many people consider that Unicode, or UTF-8 at least, is strongly >>favouring English (boldly American) over any other script or >>language. If it has not been so, Americans would never have >>promoted it so much, and would have rather shown an infinite and >>eternal reluctance...
To be fair to the developers of Unicode, I'd suggest that the issue is not favoring (note spelling! :) ) English, but rather keeping compatibility with an enormous amount of existing data which was encoded in ASCII.
Of course, this is the standard and official reason. Yet, the net effect of that concern and constraint, noticed by many foreigners, is that Unicode favours English. (About "favouring" spelling, I find it amusing to spell-check my out-going email with a British dictionary.)
Which was an English standard, but you can only do so much in 7 bits... As for American reluctance, how are you going to convince anyone to double (at least) the storage requirements for their data, to support languages they never use? That would have cost a great deal of money.
I would not think money has to be expressed in term of storage. Storage considerations are more likely a justification than an explanation for the reluctance. UTF-8 is such that on disk, and for applications using UTF-8 internally (there are a few), not a single bit is spent on extra storage for English. There are cases, and the current Python approach is one of them, Unicode may be made to be fairly unobtrusive on memory consumption, at least in English contexts.
The complexity added by Unicode, however, may undoubtedly be a concern, for any implementor wanting to really address that standard, that is, further than merely toying with 16-bit characters. This means human time, and this is where the real cost lies.
-- François Pinard http://www.iro.umontreal.ca/~pinard
- Previous message: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)
- Next message: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]