[Python-Dev] Python and the Unicode Character Database (original) (raw)

Alexander Belopolsky alexander.belopolsky at gmail.com
Mon Nov 29 20:38:46 CET 2010

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Nov 29, 2010 at 1:33 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

On Mon, 29 Nov 2010 08:22:46 +0100 "Martin v. Löwis" <martin at v.loewis.de> wrote:

> The former ensures that literals in code are always readable; the later > allows users to enter numbers in their own number system. How could that > be a bad thing?

It's YAGNI, feature bloat. It gives the illusion of supporting something that actually isn't supported very well (namely, parsing local number strings). I claim that there is no meaningful application of this feature. Still, if it's not detrimental and it it's not difficult to support, then why do you care?

It is difficult to support. A fix for issue10557 would be much simpler if we did not support non-European digits. I now added a patch that handles non-ascii digits, so you can see what's involved. Note that when Unicode Consortium inevitably adds more Nd characters to the non-BMP planes, we will have to add surrogate pairs' support to this code.

In any case, there is little we can do about it in 3.2 other than fix bugs like issue10557 without breaking currently valid code, so I created a separate issue to continue this debate in context of 3.3. [issue10581]

Now, I would like to bring this thread back to it's subject. Given that UCD is now affecting the language definition and the standard library behavior, how should changes to UCD be handled?

Should Python documentation refer to the specific version of Unicode that it supports?

Current documentation refers to old versions. Should version be updated or removed to imply the latest?

How UCD updates should be handled during the language moratorium?

During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. Should this upgrade be backported to 2.7?

How specific should library reference manual be in defining methods affected by UCD such as str.upper()?
What is an acceptable level of variation between Python implementations? For example, if '\UXXXXXXXX'.isalpha() returns true in one implementation, can it return false in another? Note that even CPython narrow and wide builds are presently not consistent in this respect.

[issue10581] http://bugs.python.org/issue10581

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list