[Python-Dev] Python and the Unicode Character Database (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue Nov 30 05:20:11 CET 2010


M.-A. Lemburg writes:

Just because ASCII-proponents may have a hard time reading such literals,

That's not the point.

doesn't mean that script users have the same trouble.

The script users may have no trouble reading them, but that doesn't mean it's not a YAGNI. In Japanese, it's a YAGNI except in addresses on New Year cards and in dates, which could be handled by specialized modules, or by a generic module for extracting numeric information from general (as opposed to program) text. Neither of those is likely to appear in program text in context where they would be used as a numeric literal.

In fact, Python does consider it a YAGNI for Han! Although my apartment number would be written "七〇四" on a New Year card, Python won't parse it as 704: unicodedata considers those digits to be Lo, except for "〇" which fails anyway because it's Nl, not Nd. (To add insult to injury, it doesn't even return numeric values for those characters, even though any Han-user would consider them numeric when used in isolation, except that Japanese would be likely to consider "〇" to be the non-numeric "maru" symbol, ie, circle, meaning "OK"!)

The whole concept of numeric in Unicode is a mess; why import that mess into Python?

Can you give any examples where people do computation, keep books, or do nuclear physics in non-Arabic numerals? I suppose Arabic users might, but even there I suspect not.



More information about the Python-Dev mailing list