[Python-Dev] Unicode strings as filenames (original) (raw)

Martin v. Loewis martin@v.loewis.de
Thu, 3 Jan 2002 22:52:19 +0100


What's the correct way to deal with filenames in a Unicode environment? Consider this: =20 >>> import site >>> site.encoding 'latin-1'

Setting site.encoding is certainly the wrong thing to do. How can you know all users of your system use latin-1?

If I change my site's default encoding back to ascii, the second open fai= ls: =20 >>> import site >>> site.encoding 'ascii' >>> a =3D "abc\xe4\xfc\xdf.txt" >>> u =3D unicode (a, "latin-1")

On my system, the following works fine

import locale locale.setlocale(locale.LCALL,"") 'LC_CTYPE=3Dde_DE;LC_NUMERIC=3Dde_DE;LC_TIME=3Dde_DE;LC_COLLATE=3DC;LC_MONE= TARY=3Dde_DE;LC_MESSAGES=3Dde_DE;LC_PAPER=3Dde_DE;LC_NAME=3Dde_DE;LC_ADDRES= S=3Dde_DE;LC_TELEPHONE=3Dde_DE;LC_MEASUREMENT=3Dde_DE;LC_IDENTIFICATION=3Dd= e_DE' a =3D "abc\xe4\xfc\xdf.txt" u =3D unicode (a, "latin-1") open(u, "w") <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x8173e88>

On Unix, your best bet for file names is to trust the user's locale settings. If you do that, open will accept Unicode objects.

What is your locale?

Is that the correct approach? Apparently Python's file object doesn't do this under the covers. Should it?

No. There is no established convention, on Unix, how to do non-ASCII file names. If anything, following the user's locale setting is the most reasonable thing to do; this should be in synch of how the user's terminal displays characters. The Python installations' default encoding is almost useless, and shouldn't be changed.

On Windows, things are much better, since there a notion of Unicode file names in the system.

Regards, Martin