[Python-Dev] Python and the Unicode Character Database (original) (raw)

M.-A. Lemburg mal at egenix.com
Thu Dec 2 21:05:21 CET 2010

Previous message: [Python-Dev] PEP 384 accepted
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Martin v. Löwis" wrote:

Now, one may wonder what precisely a "possibly signed floating point number" is, but most likely, this refers to

floatnumber ::= pointfloat | exponentfloat pointfloat ::= [intpart] fraction | intpart "." exponentfloat ::= (intpart | pointfloat) exponent intpart ::= digit+ fraction ::= "." digit+ exponent ::= ("e" | "E") ["+" | "-"] digit+ digit ::= "0"..."9" I don't see why the language spec should limit the wealth of number formats supported by float(). If it doesn't, there should be some other specification of what is correct and what is not. It must not be unspecified.

True.

It is not uncommon for Asians and other non-Latin script users to use their own native script symbols for numbers. Just because these digits may look strange to someone doesn't mean that they are meaningless or should be discarded. Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing system in which '١٢٣٤.٥٦e4' means 12345600.0.

I'm not sure what you're after here.

Please also remember that Python3 now allows Unicode names for identifiers for much the same reasons. No no no. Addition of Unicode identifiers has a well-designed, deliberate specification, with a PEP and all. The support for non-ASCII digits in float appears to be ad-hoc, and not founded on actual needs of actual users.

Please note that we didn't have PEPs and the PEP process at the time. The Unicode proposal predates and in some respects inspired the PEP process.

The decision to add this support was deliberate based on the desire to support as much of the nice features of Unicode in Python as we could. At least that was what was driving me at the time.

Regarding actual needs of actual users: I don't buy that as an argument when it comes to supporting a standard that is meant to attract users with non-ASCII origins.

Some references you may want to read up on:

http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture http://en.wikipedia.org/wiki/Vietnamese_numerals http://en.wikipedia.org/wiki/Korean_numerals http://en.wikipedia.org/wiki/Japanese_numerals

Even MS Office supports them:

http://languages.siuc.edu/Chinese/Language_Settings.html

Note that the support in float() (and the other numeric constructors) to work with Unicode code points was explicitly added when Unicode support was added to Python and has been available since Python 1.6. That doesn't necessarily make it useful. Alexander's complaint is that it makes Python unstable (i.e. changing as the UCD changes).

If that were true, then all Unicode database (UCD) changes would make Python unstable. However, most changes to existing code points in the UCS are bug fixes, so they actually have a stabilizing quality more than a destabilizing one.

It is not a bug by any definition of "bug" Most certainly it is: the documentation is either underspecified, or deviates from the implementation (when taking the most plausible interpretation). This is the very definition of "bug".

The implementation is not a bug and neither was this a bug in the 2.x series of the Python documentation. The Python 3.x docs apparently introduced a reference to the language spec which is clearly not capturing the wealth of possible inputs.

So, yes, we're talking about a documentation bug, but not an implementation bug.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Nov 29 2010)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Previous message: [Python-Dev] PEP 384 accepted
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list