[Python-Dev] PEP 393 Summer of Code Project (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Thu Aug 25 09:50:08 CEST 2011


What about things like the surrogateescape codec that deliberately use code units in non-standard ways? Will tricks like that still be possible if the code-unit level is hidden from the programmer?

Most certainly. In the PEP-393 representation, the surrogate characters can readily be represented (and would imply atleast the two-byte form), but they will never take their UTF-16 function (i.e. the UTF-8 codec won't try to combine surrogate pairs), so they can be used for surrogateescape and other functions. Of course, in strict error mode, codecs will refuse to encode them (notice that surrogateescape is an error handler, not a codec).

Regards, Martin



More information about the Python-Dev mailing list