[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 25 18:33:17 CEST 2009


I see two main user-oriented use cases for the resulting Unicode strings this PEP will produce on all systems: displaying a list of filenames for the user to select from (an open file dialog), and allowing a user to edit or supply a filename (a save dialog or a rename control).

There are more, in particular the case "user passes a file name on the command line", and "web server passes URL in environment variable".

It's clear what this PEP provides for the former. On well-behaved systems where a simpler filesystemencoding approach would work, the results are identical; the user can select filenames that are what he expects to see on both Unix and Windows. On less well-behaved systems, some characters may appear as junk in the middle of the name (or would they be invisible?)

Depends on the rendering. Try "print u'\udc00'" in your terminal to see what happens; for me, it renders the glyph for "replacement character". In GUI applications, you often see white boxes (rectangles).

What I don't find clear is what the risks are for the latter. On the less well behaved system, a user may well attempt to use this python application to fix filenames. Can we estimate a likelihood that edits to the names would result in a Unicode string that can no longer be encoded with the python-escape? Will a new name fully provided by a user on his keyboard (ignoring copy and paste) almost always safely encode?

That very much depends on the system setup, and your impression is right that the PEP doesn't address it - it only deals with cases where you get random unsupported bytes; getting random unsupported characters from the user is not considered.

If the user has the locale setup in way that matches his keyboard, it should work all fine - and will already, even without the PEP. If the user enters a character that doesn't directly map to a good file name, you get an exception, and have to tell the user to pick a different filename.

Notice that it may fail at several layers:

Regards, Martin



More information about the Python-Dev mailing list