[Python-Dev] Missing FAQ about Python3 and unicode (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Wed Dec 31 01:49:32 CET 2008


Hi,

Slowly, we get recurrent questions about Python3 and unicode. It's maybe time to start a FAQ? Here is an ugly draft to start it ;-)

(1) Exit on undecodable command line arguments

$ LANG=en_GB.UTF-8 python3.0 test.py $'\xff' Could not convert argument 2 to string$

Is it an expected behaviour? Yes!

Example of the question: http://bugs.python.org/issue3023

(2) Undecodable filenames

os.listdir(str)->str raises an exception on undecodable filenames.

Solution: use os.listdir(bytes)->bytes. To display the filename to the user, use a function like:

import sys def humanFilename(filename): encoding = sys.getfilesystemencoding() return filename.encode(encoding, "replace")

See also http://bugs.python.org/issue3187

(3) Bytes environment variables

Python 3.0 only supports decodable variables for os.environ. Undecodable variables are skipped for the creation of os.environ but original variables still exist at the C level.

$ A=$(echo -e "\xff") B=c ./python Python 3.1a0 (py3k:67973M, Dec 31 2008, 00:51:49)

import os os.environ.get('A'), os.environ.get('B') (None, 'c') retcode=os.system('echo -n $A|hexdump -C') 00000000 ff |.| 00000001 retcode=os.system('echo -n $B|hexdump -C') 00000000 63 |c| 00000001

Discussion to support bytes environment variables: http://mail.python.org/pipermail/python-dev/2008-December/083856.html

-- Victor Stinner aka haypo http://www.haypocalc.com/blog/



More information about the Python-Dev mailing list