[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Baptiste Carvello baptiste13z at free.fr
Wed Apr 29 10:43:49 CEST 2009

Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Lino Mastrodomenico a écrit :

Only for the new utf-8b encoding (if Martin agrees), while the existing utf-8 is fine as is (or at least waaay outside the scope of this PEP).

This is questionable. This would have the consequence that \udcxx in a python string would sometimes mean a surrogate, and sometimes mean raw bytes, depending on the history of the string.

By contrast, if the new utf-8b codec would supercede the old one, \udcxx would always mean raw bytes (at least on UCS-4 builds, where surrogates are unused). Thus ambiguity could be avoided.

Baptiste

Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list