[Python-Dev] str.ascii_lower (original) (raw)
Martin v. Loewis martin at v.loewis.de
Mon Dec 29 13:04:56 EST 2003
- Previous message: [Python-Dev] str.ascii_lower
- Next message: [Python-Dev] urllib2 doesn't grok URLs w/ user/passwd
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Epler wrote:
u"I".lower() # Python bug? (should be u'\u0131') u'i'
As Guido says: unicode.tolower is locale-inaware; it uses the Unicode Consortium character properties instead to determine the lower-case character.
"I".lower() # C library bug? (should be "\xc4\xb1")* 'I'
This is really a limitation of the C language, not of the C library. The interface is
char tolower(char input);
so it can only accept and return a single char. Multi-byte characters are not supported in that interface.
Traditionally, for characters that cannot be converted, tolower returns its argument.
"I".lower() # (UTF-8 locale works properly in english) 'i'
This is because "i" is a single byte in UTF-8.
Regards, Martin
- Previous message: [Python-Dev] str.ascii_lower
- Next message: [Python-Dev] urllib2 doesn't grok URLs w/ user/passwd
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]