[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Tue Apr 28 18:49:23 CEST 2009
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is not a valid Unicode character (not a character at all, really) and the only way you can put this in a POSIX filename is if you use a very lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'.
Since this byte sequence doesn't represent a valid character when decoded with UTF-8, it should simply be considered an invalid UTF-8 sequence of three bytes and decoded to '\udced\udcb3\udcbf' (not '\udcff'). Martin: maybe the PEP should say this explicitly?
Sure, will do.
Regards, Martin
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]