[Python-Dev] Unicode locale values in 2.7 (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Thu Dec 3 14:49:13 CET 2009

Previous message: [Python-Dev] Unicode locale values in 2.7
Next message: [Python-Dev] wpython is back
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

But in trunk, the value is just used as-is. So when formating a decimal, for example, '\xc2\xa0' is just inserted into the result, such as:

format(Decimal('1000'), 'n') '1\xc2\xa0000' This doesn't make much sense

I agree with Antoine: it makes sense, and is the correct answer, given the locale definition.

Now, I think that the locale definition is flawed - it's not a property of the Czech language or culture that the "no-break space" character is the thousands-separator. If anything other than the regular space should be the thousands separator, it should be "thin space", and it should be used in all locales on a system that currently use space. Having it just in the Czech locale is a misconfiguration, IMO.

But if we accept the system's locale definition, then the above is certainly the right answer.

and causes an error when converting it to unicode:

format(Decimal('1000'), u'n')

You'll need to decode in the locale's encoding, then it would work. Unfortunately, that is difficult to achieve.

I believe that the correct solution is to do what py3k does in locale, which is to convert the struct lconv values to unicode. But since this would be a disruptive change if universally applied, I'd like to propose that we only convert to unicode if the values won't fit into a str.

I think Guido is on record for objecting to that kind of API strongly.

So the algorithm would be something like: 1. call mbstowcs 2. if every value in the result is in the range [32, 126], return a str 3. otherwise, return a unicode

Not sure what API you are describing here - the algorithm for doing what?

This would mean that for most locales, the current behavior in trunk wouldn't change: the locale.localeconv() values would continue to be str. Only for those locales where the values wouldn't fit into a str would unicode be returned.

Does this seem like an acceptable change?

Definitely not. This will be just for 2.7, and I see no point in producing such an incompatibility. Applications may already perform the conversion themselves, and that would break under such a change.

Regards, Martin

Previous message: [Python-Dev] Unicode locale values in 2.7
Next message: [Python-Dev] wpython is back
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list