[Python-Dev] unicode_internal codec and the PEP 393 (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Wed Nov 9 22:03:52 CET 2011

Previous message: [Python-Dev] unicode_internal codec and the PEP 393
Next message: [Python-Dev] unicode_internal codec and the PEP 393
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The unicodeinternal decoder doesn't decode surrogate pairs and so testunicode.UnicodeTest.testcodecs() is failing on Windows (16-bit wchart). I don't know if this codec is still revelant with the PEP 393 because the internal representation is now depending on the maximum character (PyUCS1*, PyUCS2* or PyUCS4*), whereas it was a fixed size with Python <= 3.2 (PyUNICODE*).

The current status is the way it is because we (Torsten and me) didn't bother figuring out the purpose of the internal codec.

Should we:

* Drop this codec (public and documented, but I don't know if it is used) * Use wchart* (PyUNICODE*) to provide a result similar to Python 3.2, and so fix the decoder to handle surrogate pairs * Use the real representation (PyUCS1*, PyUCS2 or PyUCS4* string)

It's described as "Return the internal representation of the operand". That would suggest that the last choice (i.e. return the real internal representation) would be best, except that this doesn't round-trip. Adding a prefix byte indicating the kind (and perhaps also the ASCII flag) would then be closest to the real representation.

As that is likely not very useful, and might break some applications of the encoding (if there are any at all) which might expect to pass unicode-internal strings across Python versions, I would then also deprecate the encoding.

Regards, Martin

Previous message: [Python-Dev] unicode_internal codec and the PEP 393
Next message: [Python-Dev] unicode_internal codec and the PEP 393
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list