[Python-Dev] a suggestion ... Re: PEP 383 (again) (original) (raw)

glyph at divmod.com glyph at divmod.com
Thu Apr 30 18:26:25 CEST 2009


On 03:35 pm, martin at v.loewis.de wrote:

So, why do you prefer half surrogate coding to U+0000 quoting? If I pass a string with an embedded U+0000 to gtk, gtk will truncate the string, and stop rendering it at this character. This is worse than what it does for invalid UTF-8 sequences. Chances are fairly high that other C libraries will fail in the same way, in particular if they expect char* (which is very common in C).

Hmm. I believe the intended failure mode here, for PyGTK at least, is actually this:

TypeError: GtkLabel.set_text() argument 1 must be string without null 

bytes, not unicode

APIs in PyGTK which accept NULLs and silently trucate are probably broken. Although perhaps I've just made your point even more strongly; one because the behavior is inconsistent, and two because it sometimes raises an exception if a NULL is present, and apparently the goal here is to prevent exceptions from being raised anywhere in the process.

For this idiom to be of any use to GTK programs, gtk.FileChooser.get_filename() will probably need to be changed, since (in py2) it currently returns a str, not unicode.

The PEP should say something about how GUI libraries should handle file choosers, so that they'll be consistent and compatible with the standard library. Perhaps only that file choosers need to take this PEP into account, and the rest is obvious. Or maybe the right thing for GTK to do would be to continue to use bytes on POSIX and convert to text on Windows, since open(), listdir() et. al. will continue to accept bytes for filenames?

So I prefer the half surrogate because its failure mode is better th

Heh heh heh.



More information about the Python-Dev mailing list