[Python-Dev] Python and the Unicode Character Database (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Fri Dec 3 00:19:20 CET 2010

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Am 02.12.2010 23:43, schrieb M.-A. Lemburg:

Eric Smith wrote:

The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module.

I agree with everything Martin says here. I think the basic premise is: you won't find strings "in the wild" that use non-ASCII digits but do use the ASCII dot as a decimal point. And that's what float() is looking for. (And that doesn't even begin to address what it expects for an exponent 'e'.) http://en.wikipedia.org/wiki/Decimalmark "In China, comma and space are used to mark digit groups because dot is used as decimal mark."

I may be misinterpreting that, but I think that refers to the case of writing numbers using Arabic digits.

"Chinese" digits are, e.g., used in the Suzhou numerals

http://en.wikipedia.org/wiki/Suzhou_numerals

This doesn't have a decimal point at all. Instead, the second line (below or left to the actual digits) describes the power of ten and the unit of measurement (i.e. similar to scientific notation, but with ideographs for the powers of ten).

In another writing system, they use 点 (U+70B9) as the decimal separator, see

http://en.wikipedia.org/wiki/Chinese_numerals#Fractional_values

In the same system, the integral part uses multipliers, i.e. 12345 is [1][10000][2][1000][3][100][4][10][5]; the fractional part uses regular digits.

Regards, Martin

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list