[Python-Dev] Unicode locale values in 2.7 (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Thu Dec 3 14:49:13 CET 2009
- Previous message: [Python-Dev] Unicode locale values in 2.7
- Next message: [Python-Dev] wpython is back
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
But in trunk, the value is just used as-is. So when formating a decimal, for example, '\xc2\xa0' is just inserted into the result, such as:
format(Decimal('1000'), 'n') '1\xc2\xa0000' This doesn't make much sense
I agree with Antoine: it makes sense, and is the correct answer, given the locale definition.
Now, I think that the locale definition is flawed - it's not a property of the Czech language or culture that the "no-break space" character is the thousands-separator. If anything other than the regular space should be the thousands separator, it should be "thin space", and it should be used in all locales on a system that currently use space. Having it just in the Czech locale is a misconfiguration, IMO.
But if we accept the system's locale definition, then the above is certainly the right answer.
and causes an error when converting it to unicode:
format(Decimal('1000'), u'n')
You'll need to decode in the locale's encoding, then it would work. Unfortunately, that is difficult to achieve.
I believe that the correct solution is to do what py3k does in locale, which is to convert the struct lconv values to unicode. But since this would be a disruptive change if universally applied, I'd like to propose that we only convert to unicode if the values won't fit into a str.
I think Guido is on record for objecting to that kind of API strongly.
So the algorithm would be something like: 1. call mbstowcs 2. if every value in the result is in the range [32, 126], return a str 3. otherwise, return a unicode
Not sure what API you are describing here - the algorithm for doing what?
This would mean that for most locales, the current behavior in trunk wouldn't change: the locale.localeconv() values would continue to be str. Only for those locales where the values wouldn't fit into a str would unicode be returned.
Does this seem like an acceptable change?
Definitely not. This will be just for 2.7, and I see no point in producing such an incompatibility. Applications may already perform the conversion themselves, and that would break under such a change.
Regards, Martin
- Previous message: [Python-Dev] Unicode locale values in 2.7
- Next message: [Python-Dev] wpython is back
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]