[Python-Dev] Odd lines in unicodedata_db.h (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sun Apr 4 12:59:14 CEST 2010

Previous message: [Python-Dev] Odd lines in unicodedata_db.h
Next message: [Python-Dev] Odd lines in unicodedata_db.h
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Amaury Forgeot d'Arc writes:

I don't think so. Unicode 3.2 did contain two entries with large numeric values. The file Unihan-3.2.0.txt contains these two lines:

U+4EAC kPrimaryNumeric 10,000,000,000,000,000 ten quadrillion (American) U+5793 kPrimaryNumeric 100,000,000,000,000,000,000 hundred quintillion (American)

They are related to the Chinese numbering system. I recall U+4EAC having that value from my textbooks (it's the "kyo" in Tokyo, and the "jing" in "Beijing", so quite memorable), and U+5793 looks familiar (it's not otherwise used in Japanese AFAIK, so I'm not sure, but it seems quite plausible that there would be a character for 10000^5).

For some reason newer versions of the unicode standard removed these values.

The characters are still there. The numeric values were probably removed because in practice they're not actually used (at least, almost never in Japanese). It seems a little sad to save 150 bytes or so in a table and lose the historical meanings.

Previous message: [Python-Dev] Odd lines in unicodedata_db.h
Next message: [Python-Dev] Odd lines in unicodedata_db.h
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list