[Python-Dev] Python code.interact() and UTF-8 locale (original) (raw)

Victor STINNER victor.stinner-linux at haypocalc.com
Tue Sep 13 15:53:29 CEST 2005


Le mardi 13 septembre 2005 à 17:56 +0900, Hye-Shik Chang a écrit :

On 9/11/05, Victor STINNER <victor.stinner-linux at haypocalc.com> wrote: > Hi, > > I found a bug in Python interactive command line (program python alone: > looks to be code.interact() function in code.py). With UTF-8 locale, the > command << u"é" >> returns << u'\xc3\xa9' >> and not << u'\xE9' >>. > Remember: the french e with acute is Unicode 233 (0xE9), encoded \xC3 > \xA9 in UTF-8.

Which version of python do you use? From 2.4, the interactive mode respects locale as a source code encoding and it falls back to latin-1 when decoding fails. Python 2.4.1 (#2, Jul 31 2005, 04:45:53) [GCC 3.4.2 [FreeBSD] 20040728] on freebsd5 Type "help", "copyright", "credits" or "license" for more information. >>> u"é" u'\xe9'

I installed my own Python 2.4 in /opt/python/. I don't know if the right code.py is loaded, but here is the output : =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- $ ./python2.4 Python 2.4.1 (#1, Sep 11 2005, 01:37:26) [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

u"é" u'\xe9' import code code.interact() Python 2.4.1 (#1, Sep 11 2005, 01:37:26) [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) u"é" u'\xc3\xa9' =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Well, that works better :-) For code.interact(), you can read my attached patch. I don't know if it the best way to fix the but.

But, the following code still bug in Python 2.4 : =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- $ cat python_unicode_eval_bug.py #- coding: UTF-8 -- print "One Unicode character: %u" % len(u"é") print "One Unicode character (using eval) : %u" % eval('len(u"é")') $ python2.4 python_unicode_eval_bug.py One Unicode character: 1 One Unicode character (using eval) : 2 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

RexFi explains me that Python can't guess eval('len(u"é")') charset. Yep, that's difficult: locale? charset encoding? This test doesn't matter.

@+, Haypo -------------- next part -------------- A non-text attachment was scrubbed... Name: code-interact.patch Type: text/x-patch Size: 407 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20050913/c620d813/code-interact.bin



More information about the Python-Dev mailing list