[Python-Dev] Unicode 5.1.0 (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Sun Aug 24 21:35:24 CEST 2008


is the suggestion to replace the 4.1.0 database with a 5.1.0 database, or to add yet another database in that module?

I would replace it.

(how's the 3.2/4.1 dual support implemented?

The compiler needs data files for all supported versions, with old_versions listing the, well, old versions. It then computes deltas, expecting that they should mostly consist of new assignments (i.e. characters unassigned in 3.2 might be assigned in newer versions). It detects all differences, but might not be able to represent all changes.

do we have two distinct datasets, or are the differences encoded in some clever way?

The latter. It doesn't really need to be that clever: primarily just a compressed list of "new" characters is needed, per version.

would it make sense to split the unicodedata module into three separate modules, one for each major Unicode version?)

You couldn't use the space savings then, I suppose.

Regards, Martin



More information about the Python-Dev mailing list