[Python-Dev] logging module broken because of locale (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Jul 18 21:26:28 CEST 2006


James Y Knight wrote:

That seems backwards of how it should be ideally: the byte-string upper and lower should always do ascii uppering-and-lowering, and the unicode ones should do it according to locale. Perhaps that can be cleaned up in py3k?

Cleaned-up, yes. But it is currently not backwards.

For a byte string, you need an encoding, which comes from the locale. So for byte strings, case-conversion has to be locale-aware (in principle, making it encoding-aware only would almost suffice, but there is no universal API for that).

OTOH, for Unicode, due to the unification, case-conversion mostly does not need to be locale-aware. Nearly all case-conversions are only script-dependent, not language-dependent. So it is nearly possible to make case-conversion locale-independent, and that is what Python provides.

The "nearly" above refers to very few exceptions, in very few languages. Most of the details are collected in UAX#21, some highlights are:

I believe the unicode.lower behaviour is currently right for most applications, so it should continue to be the default. An additional locale-aware version should be added, but that probably means to incorporate ICU into Python, to get this and other locale properties right in a platform-independent fashion.

Regards, Martin



More information about the Python-Dev mailing list