[Python-Dev] lone surrogates in utf-8 (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Tue Apr 28 15:13:37 CEST 2009


Hrvoje Niksic <hrvoje.niksic avl.com> writes:

"Should be considered" or "will be considered"? Python 3.0's UTF-8 decoder happily accepts it and returns u'\udcff': >>> b'\xed\xb3\xbf'.decode('utf-8') '\udcff'

Yes, there is already a bug entry for it: http://bugs.python.org/issue3672

I think we could happily fix it for 3.1 (perhaps leaving 2.7 unchanged for compatibility reasons - I don't know if some people may rely on the current behaviour).



More information about the Python-Dev mailing list