[Python-Dev] Python and the Unicode Character Database (original) (raw)

Terry Reedy tjreedy at udel.edu
Fri Dec 3 02:52:23 CET 2010

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12/2/2010 6:54 PM, Alexander Belopolsky wrote:

On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg<mal at egenix.com> wrote: ..

Some examples:

http://www.bdl.gov.lb/circ/intpdf/int123.pdf I looked at this one more closely. While I cannot understand what it says, It appears that Arabic numerals are used in dates. It looks like Python want be able to deal with those:

When I travelled in S. Asia around 25 years ago, arabic and indic numerals were in obvious use in stores, road signs, and banks (as with money exchange receipts). I learned the digits partly for self-protestions ;-). I have no real idea of what is done now in computerized business, but I assume the native digits are used.

It may well be that there is no Python software yet that operates with native digits. The lack of direct output capability would hinder that. Of course, someone could run both input and output through language-specific str.translate digit translators.

datetime.strptime('١٩٩٩/١٠/٢٩', '%Y/%m/%d')

Googling ١٩٩٩ gets about 83,000 hits.

.. ValueError: time data '١٩٩٩/١٠/٢٩' does not match format '%Y/%m/%d'

Interestingly,

datetime.strptime('١٩٩٩', '%Y') datetime.datetime(1999, 1, 1, 0, 0) which further suggests that support of such numerals is accidental. As I think more about it, though I am becoming less avert to accepting these numerals for base 10 integers.

Both input and output are needed for educational programming, though translation tables might be enough.

Integers can be easily extracted from text using simple regex and '\d' accepts all category Nd characters. I would require though that all digits be from the same block, which is not hard because Unicode now promises to only have them in contiguous blocks of 10.

That seems sensible.

This rule seems to address some of security issues because it is unlikely that a system that can display some of the local digits would not be able to display all of them properly.

I still don't think it makes any sense to accept them in float().

For the present, I would pretty well agree with that, at least until we know more.

You have raised an important issue. It is a bit of a chicken and egg problem though. We will not really know what is needed until Python is used more in non-english/non-euro contexts, while such usage may await better support.

-- Terry Jan Reedy

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list