[Python-Dev] Python and the Unicode Character Database (original) (raw)
Steven D'Aprano steve at pearwood.info
Thu Dec 2 01:17:51 CET 2010
- Previous message: [Python-Dev] Python and the Unicode Character Database
- Next message: [Python-Dev] Python and the Unicode Character Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it was supposedly meant to enable. By that argument, English speakers wanting to enter integers using Arabic numerals can't either! That's correct, and the key point here for the argument. It's just not meant to support localized number forms, but deliberately constrains them to a formal grammar which users using it must be aware of in order to use it.
You're agreeing that English speakers can't enter integers using Arabic numerals? What do you think I'm doing when I do this?
int("1234") 1234
Ah wait... did you think I meant Arabic numerals in the sense of digits used by Arabs in Arabia? I meant Arabic numerals as opposed to Roman numerals. Sorry for the confusion.
Your argument was that even though Python's int() supports many non-ASCII digits, the lack of grouping means that it "doesn't actually work". If that argument were correct, then it applies equally to ASCII digits as well.
It's clearly nonsense to say that int("1234") "doesn't work" just because of the lack of grouping. It's equally nonsense to say that int("١٢٣٤") "doesn't work" because of the lack of grouping.
[...]
I take it that you speak in favor of the float syntax also being used for the float() constructor.
I'm sorry, I don't understand what you mean here. I've repeatedly said that the syntax for numeric literals should remain constrained to the ASCII digits, as it currently is.
n = ١٢٣٤
gives a SyntaxError, and I don't want to see that change.
But I've also argued that the float constructor currently accepts non-ASCII strings:
n = int("١٢٣٤")
we should continue to support the existing behaviour. None of the arguments against it seem convincing to me, particularly since the opponents of the current behaviour admit that there is a use-case for it, but they just want it to move elsewhere, such as the locale module.
We've even heard from one person -- I forget who, sorry -- who claimed that C++ has the same behaviour, and if you want ASCII-only digits, you have to explicitly ask for it.
For what it's worth, Microsoft warns developers not to assume users will enter numeric data using ASCII digits:
"Number representation can also use non-ASCII native digits, so your application may encounter characters other than 0-9 as inputs. Avoid filtering on U+0030 through U+0039 to prevent frustration for users who are trying to enter data using non-ASCII digits."
http://msdn.microsoft.com/en-us/magazine/cc163506.aspx
There was a similar discussion going on in Perl-land recently:
http://www.nntp.perl.org/group/perl.perl5.porters/2010/07/msg162400.html
although, being Perl, the discussion was dominated by concerns about regexes and implicit conversions, rather than an explicit call to float() or int() as we are discussing here.
[...]
In the same way, if I wanted to enter a number using non-Arabic digits, it works provided I compromise by using the Anglo-American decimal point instead of the European comma or the native decimal point I might prefer. Why would you want that, if, what you really wanted, could not be done. There certainly is a way to convert strings into floats, and there would be a way if that restricted itself to the digits 0..9. So it can't be the mere desire to convert strings to float that make you ask for non-ASCII digits.
Why do Europeans use programming languages that force them to use a dot instead of a comma for the decimal place? Why do I misspell string.centre as string.center? Because if you want to get something done, you use the tools you have and not the tools you'd like to have.
-- Steven
- Previous message: [Python-Dev] Python and the Unicode Character Database
- Next message: [Python-Dev] Python and the Unicode Character Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]