[Python-Dev] unicode_string future, str -> basestring, fix or feature (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue Mar 4 14:23:52 CET 2014


Guido van Rossum writes:

Given that the claim "Python 2 doesn't support Unicode filenames" is factually incorrect (in Python 2.7, most filesystem calls in fact do support Unicode, at least on some platforms),

I don't understand what "support Unicode" means. Just that

with open(u"\u4e00", "w") as f: f.write("works!\n")

does what is expected[1] if the user knows what he is doing (ie, has set PYTHONIOENCODING to a Unicode UTF or one of the Asian encodings)?

I think individual functions in the os module that are found lacking should be considered bugs, and if someone goes through the effort to supply an otherwise acceptable fix, we shouldn't reject it on the basis that we don't want to consider supporting Unicode filenames.

As above, "acceptable fix" means take whatever the current value is for file system name encoding, and use that to encode and decode unicode objects to/from str, or raise a UnicodeError if it doesn't work?

I think it's important to define this somewhat carefully, because this is an area that has a strong tendency to "mission creep". Given that builtin open "works" by the above definition, I guess it's reasonable to accept such patches.

Footnotes: [1] It writes the line "works!\n" to a file whose name consists of the single Chinese character for "one".



More information about the Python-Dev mailing list