[Python-Dev] Python and the Unicode Character Database (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Thu Dec 2 08:49:24 CET 2010


Ben Finney writes:

Input from an existing text file, as I said earlier. Or any other way of text data making its way into a Python program.

Direct entry at the console is a red herring.

I don't think it is. Not at all. Here's why: '''print "%d" % some_integer''' doesn't now, and never will (unless Kristan gets his Python 2.8), produce Arabic or Han numerals. Not in any language I know of, not in Microsoft Excel, and definitely not in Python 2. Somebody typed that text at some point. If it's Han, that somebody had way too much time on his hands, not a working accountant nor a graduate assistant in a research lab for sure.

How about old archived texts, copied and recopied? At least for Japanese, old archival (text) data will all be in ASCII, because the earliest implementations of Japanese language text used JIS X 0201 (or its predecessor), which doesn't have Han digits (and kana digits don't exist even if you write with a brush and ink AFAIK). Ditto Arabic, I would imagine; ISO 8859/6 (aka Latin/Arabic) does not contain the Arabic digits that have been presented here earlier AFAICT. Note that there's plenty of space for them in that code table (eg, 0xB0-0xB9 is empty). Apparently nobody ever thought it was useful to have them!

So, which culture, using which script and in which application, inputs numeric data in other than ASCII digits? Or would want to, if only somebody would tell them they can do it in Python? Hearsay will do, for starters.



More information about the Python-Dev mailing list