[Python-Dev] Python and the Unicode Character Database (original) (raw)

Steven D'Aprano steve at pearwood.info
Tue Nov 30 14:23:22 CET 2010

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Stephen J. Turnbull wrote:

Lennart Regebro writes:

> I think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be enNL.UTF-8) for the forseeable future. I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example.

I agree with you that numeric literals should be restricted to the ASCII digits. I don't think anyone here is arguing differently -- if they are, they should speak up and try to make the case for allowing numeric literals in arbitrary scripts. Python doesn't currently allow non-ASCII numeric literals, and even if such a change were desirable, it would run up against the moratorium. So let's just forget the specter of code like:

x = math.sqrt(١٢٣٤.٥٦ ** 一.一)

It ain't gonna happen :)

But I think there is a good case for allowing the constructors int, float and complex to continue to accept numeric strings with non-ASCII digits. The code already exists, there's probably people out there who rely on it, and in the absence of any convincing demonstration that the existing behaviour is causing widespread difficulty, we should leave well-enough alone.

Various people have suggested that there should be a function in the locale module that handles numeric string input in non-ASCII digits. This is a de facto admission that there are use-cases for taking user input like the string '٣' and turning it into the int 3. Python can already do this, and has been able to for many years:

[steve at sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

int(u'٣') 3

It seems to me that there's no need to move this functionality into locale.

-- Steven

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list