[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)
Adam Olsen rhamph at gmail.com
Sun Dec 7 18:35:53 CET 2008
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Dec 7, 2008 at 2:35 AM, Hagen Fürstenau <hfuerstenau at gmx.net> wrote:
As far as I can see all Python Unicode strings can be encoded to UTF-8, even things like lone surrogates because Python doesn't care about them. So both the Unicode API and the binary API would be fail-safe on Windows.
Python is broken and needs to be fixed. http://bugs.python.org/issue3672 http://bugs.python.org/issue3297 But the question of whether Python should care about lone surrogates or not is at best tangential to the issue at hand. If you have lone surrogates in the Unicode API (and didn't raise an exception on the way getting there), then the sensible thing is to encode them into lone UTF-8 surrogates. Even if you wanted to prevent lone surrogates, encoding to UTF-8 for the binary API would not be the place to enforce it.
No. Unicode requires them to be treated as errors. If you want to pass them through then you're creating a custom encoding... which you might argue for in this case, but it needs to be clearly separate from the real UTF-8.
-- Adam Olsen, aka Rhamphoryncus
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]