[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

Hagen Fürstenau hfuerstenau at gmx.net
Sun Dec 7 10:35:15 CET 2008


As far as I can see all Python Unicode strings can be encoded to UTF-8, even things like lone surrogates because Python doesn't care about them. So both the Unicode API and the binary API would be fail-safe on Windows. Python is broken and needs to be fixed. http://bugs.python.org/issue3672 http://bugs.python.org/issue3297

But the question of whether Python should care about lone surrogates or not is at best tangential to the issue at hand. If you have lone surrogates in the Unicode API (and didn't raise an exception on the way getting there), then the sensible thing is to encode them into lone UTF-8 surrogates. Even if you wanted to prevent lone surrogates, encoding to UTF-8 for the binary API would not be the place to enforce it.



More information about the Python-Dev mailing list