Issue 36486: Bugs and inconsistencies in unicodedata (original) (raw)

In unicodedata, the functions lookup and name have some bugs and inconsistencies.

lookup matches case-insensitively, except for the algorithmic names of Hangul syllables and CJK unified ideographs, which must be in all caps. The documentation does not explain how character names are fuzzily matched.

lookup accepts names like “CJK UNIFIED IDEOGRAPH-04E00”, where the code point has a leading zero.

lookup and name don’t implement rule NR2, defined in chapter 4 of Unicode, for Tangut ideographs’ names.