[Python-Dev] PEP 383 (again) (original) (raw)

Hrvoje Niksic hrvoje.niksic at avl.com
Tue Apr 28 14:41:19 CEST 2009

Previous message: [Python-Dev] One more proposed formatting change for 3.1
Next message: [Python-Dev] PEP 383 (again)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Lino Mastrodomenico wrote:

Let's suppose that I use Python 2.x or something else to create a file with name b'\xff'. My (Linux) system has a sane configuration and the filesystem encoding is UTF-8, so it's an invalid name but the kernel will blindly accept it anyway.

With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'.

One question that really bothers me about this proposal is the following:

Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8 sequence, will be converted to the half-surrogate '\udcff'. However, a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be converted to '\udcff'. Those are quite different POSIX pathnames; how will Python know which one it was when I later pass '\udcff' to open()?

A poster hinted at this question, but I haven't seen it answered, yet.

[1] I'm assuming that it's valid UTF8 because it passes through Python 2.5's '\xed\xb3\xbf'.decode('utf-8'). I don't claim to be a UTF-8 expert.

Previous message: [Python-Dev] One more proposed formatting change for 3.1
Next message: [Python-Dev] PEP 383 (again)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list