[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)
Lino Mastrodomenico l.mastrodomenico at gmail.com
Tue Apr 28 15:14:19 CEST 2009
- Previous message: [Python-Dev] lone surrogates in utf-8
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2009/4/28 Hrvoje Niksic <hrvoje.niksic at avl.com>:
Lino Mastrodomenico wrote:
Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character when decoded with UTF-8, it should simply be considered an invalid UTF-8 sequence of three bytes and decoded to '\udced\udcb3\udcbf' (not '\udcff'). "Should be considered" or "will be considered"? Python 3.0's UTF-8 decoder happily accepts it and returns u'\udcff':
b'\xed\xb3\xbf'.decode('utf-8') '\udcff'
Only for the new utf-8b encoding (if Martin agrees), while the existing utf-8 is fine as is (or at least waaay outside the scope of this PEP).
-- Lino Mastrodomenico
- Previous message: [Python-Dev] lone surrogates in utf-8
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]