[Python-Dev] Unicode 5.1.0 (original) (raw)

Guido van Rossum guido at python.org
Fri Aug 22 18:12:55 CEST 2008


2008/8/22 Fredrik Lundh <fredrik at pythonware.com>:

On Fri, Aug 22, 2008 at 4:59 PM, Guido van Rossum <guido at python.org> wrote:

(how's the 3.2/4.1 dual support implemented? do we have two distinct datasets, or are the differences encoded in some clever way? would it make sense to split the unicodedata module into three separate modules, one for each major Unicode version?) The current API looks fine to me: unicodedata is the latest version whereas unicodedata.ucd320 is the older version. The APIs are the same; there's a tiny bit of code in the generated db.h file that expresses the differences: static const changerecord* getchange320(PyUCS4 n) { int index; if (n >= 0x110000) index = 0; else { index = changes320index[n>>7]; index = changes320data[(index<<7)+(n & 127)]; } return changerecords320+index; } there's a bunch of data tables as well, but they don't seem to be very large. looks like Martin did a thorough job here. ... digging digging digging ... yes, the generator script produces difference tables between the main version and a list of older versions. I'd say it's worth running the script on the 5.1.0 tables, and if it doesn't choke, compare the resulting table with the corresponding table for 4.1.0 (a simple loop fetching the main properties for all code points). if the differences look reasonably small, switch 5.1.0 and keep the others.

Right, that's my hope as well. I believe the changes between 3.2 and 4.1 were much larger than more recent changes. (Yay convergence! :-)

I can tinker a little with this over the weekend, unless Martin tells me not to ;-)

That would be great!

-- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20080822/b9223317/attachment.htm>



More information about the Python-Dev mailing list