[Python-Dev] PEP 263 in the works (Non-ASCII characters in test_pep277.py in 2.3) (original) (raw)

Guido van Rossum guido@python.org
Mon, 07 Oct 2002 13:31:03 -0400


Now that I can edit UTF-8 directly, I find a "feature" made possible by the PEP 263 support of Python 2.3 rather puzzling:

Let's say I edit a file testencoding.py in XEmacs with UTF-8 support:

(Note that I'm viewing this as Latin-1. The comment, s and u in the source are all three the same: a-umlaut, o-umlaut, u-umlaut.)

# -- coding: utf-8; -- # comment ��� s = "���" u = u"���" print s print u.encode('latin-1') print 'works !'

With Python 2.3 this prints: äöü ��� works ! I would have expected that s turns out as "���" using print, since that's how I wrote it in the source file.

No, because stdout isn't assumed to be UTF-8. The string s is your string encoded in UTF-8, and those are the bytes written by print.

This suggests to me that mixing string and Unicode literals using non-ASCII characters in a single file should probably be avoided.

Or it suggests that we need a way to deal with encodings on stdout more gently.

--Guido van Rossum (home page: http://www.python.org/~guido/)