[Python-Dev] Unicode 5.1.0 (original) (raw)
Guido van Rossum guido at python.org
Fri Aug 22 16:59:46 CEST 2008
- Previous message: [Python-Dev] Unicode 5.1.0
- Next message: [Python-Dev] Unicode 5.1.0
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Aug 22, 2008 at 3:47 AM, Fredrik Lundh <fredrik at pythonware.com> wrote:
On Fri, Aug 22, 2008 at 3:25 AM, Guido van Rossum <guido at python.org> wrote: [MAL]
So while we could say: "we provide access to the Unicode 5.1.0 database", we cannot say: "we support Unicode 5.1.0", simply because we have not reviewed the all the necessary changes and implications.
Mark's response to this was: """ I'd suspect that you'll be as conformant to U5.1.0 as you were to U4.1.0 ;-) is the suggestion to replace the 4.1.0 database with a 5.1.0 database, or to add yet another database in that module?
That's up to us. I don't know what the reason was for keeping the 3.2.0 database around -- does anyone here recall ever using it? For what?
I think Mark believes that 5.1.0 is very much backwards compatible with 4.1.0 so that there is no need to retain access to 4.1.0; but as I said I don't know the use case so who knows.
(how's the 3.2/4.1 dual support implemented? do we have two distinct datasets, or are the differences encoded in some clever way? would it make sense to split the unicodedata module into three separate modules, one for each major Unicode version?)
The current API looks fine to me: unicodedata is the latest version whereas unicodedata.ucd_3_2_0 is the older version. The APIs are the same; there's a tiny bit of code in the generated _db.h file that expresses the differences:
static const change_record* get_change_3_2_0(Py_UCS4 n) { int index; if (n >= 0x110000) index = 0; else { index = changes_3_2_0_index[n>>7]; index = changes_3_2_0_data[(index<<7)+(n & 127)]; } return change_records_3_2_0+index; }
static Py_UCS4 normalization_3_2_0(Py_UCS4 n) { switch(n) { case 0x2f868: return 0x2136A; case 0x2f874: return 0x5F33; case 0x2f91f: return 0x43AB; case 0x2f95f: return 0x7AAE; case 0x2f9bf: return 0x4D57; default: return 0; } }
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Unicode 5.1.0
- Next message: [Python-Dev] Unicode 5.1.0
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]