[Python-Dev] len(chr(i)) = 2? (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Mon Nov 22 09:20:59 CET 2010


Unicode 5.0, Chapter 3, verse C9:

When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code sequences.

> > A Unicode-conforming Python implementation would error at the > > chr() call, or perhaps would not provide surrogateescape error > > handlers. > > Chapter and verse?

Chapter 3, verse C9 again.

I agree that the surrogateescape error handler is non-conforming, but, as you say, it doesn't claim to, either (would your concern about utf-8 being misleading here been resolved if the thing had been called "utf-8b"?)

More interestingly (and to the subject) is chr: how did you arrive at C9 banning Python3's definition of chr? This chr function puts the code sequence into well-formed UTF-16; that's the whole point of UTF-16.

Regards, Martin



More information about the Python-Dev mailing list