[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Hrvoje Niksic hrvoje.niksic at avl.com
Tue Apr 28 15:06:17 CEST 2009


Lino Mastrodomenico wrote:

Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character when decoded with UTF-8, it should simply be considered an invalid UTF-8 sequence of three bytes and decoded to '\udced\udcb3\udcbf' (not '\udcff').

"Should be considered" or "will be considered"? Python 3.0's UTF-8 decoder happily accepts it and returns u'\udcff':

b'\xed\xb3\xbf'.decode('utf-8') '\udcff'

If the PEP depends on this being changed, it should be mentioned in the PEP.



More information about the Python-Dev mailing list