[Python-3000] Unicode and OS strings (original) (raw)

martin at v.loewis.de martin at v.loewis.de
Thu Sep 20 15:51:16 CEST 2007


On Linux, filenames are byte string and not character string.

That's not true, although this is a wide-spread misunderstanding.

The POSIX standard defines that the file names must be a superset of the portable character set, which includes things such as '/', which is the path separator.

I always have his problem with Python 2.x. I converted filename (argv[x]) to Unicode to be able to format error messages in full unicode... but it's not possible. Linux allows invalid utf8 filename even on full utf8 installation (ubuntu), see Marcin's examples.

True. However, this does not mean that the file names are byte strings - they are character strings in an unspecified/undetermined encoding.

Regards, Martin



More information about the Python-3000 mailing list