[Python-Dev] Re: Regression in unicodestr.encode()? (original) (raw)
Tim Peters tim.one@comcast.net
Tue, 09 Apr 2002 21:13:37 -0400
- Previous message: [Python-Dev] Re: Regression in unicodestr.encode()?
- Next message: [Python-Dev] Re: Regression in unicodestr.encode()?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Guido]
I knew all that, but I thought I'd read about a hack to encode NUL using c0 80, specifically to get around the limitation on encoded strings containing a NUL.
Ah, that violates the "shortest encoding" rule, so is invalid UTF-8. I'm sure people have done it, though, and that many UTF-8 encoders accept it. Python's doesn't:
unicode('\xc0\x80', 'utf-8') Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: illegal encoding
Believe it or not, accepting non-shortest encodings is considered to be "a security hole"(!). That's a sad story of its own ...
- Previous message: [Python-Dev] Re: Regression in unicodestr.encode()?
- Next message: [Python-Dev] Re: Regression in unicodestr.encode()?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]