[Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...] (original) (raw)
Isaac Morland ijmorlan at uwaterloo.ca
Fri Aug 29 13:22:10 CEST 2014
- Previous message: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]
- Next message: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 29 Aug 2014, M.-A. Lemburg wrote:
On 29.08.2014 02:41, Stephen J. Turnbull wrote: Since Python allows working with lone surrogates in Unicode (they are valid code points) and we're using UTF-8 for marshal, we needed a way to make sure that Python 3 also optionally supports working with lone surrogates in such UTF-8 streams (nowadays called CESU-8: http://en.wikipedia.org/wiki/CESU-8).
If I want that wouldn't I specify "cesu-8" as the encoding?
i.e., instead of .decode ('utf-8') I would use .decode ('cesu-8'). Right now, trying this I get that cesu-8 is an unknown encoding but that could be changed without affecting the behaviour of the utf-8 codec.
It seems to me that .decode ('utf-8') should decode exactly and only valid utf-8, including the non-use of surrogate pairs as an intermediate encoding step.
Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist
- Previous message: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]
- Next message: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]