Message 160991 - Python tracker (original) (raw)

b'\xe0\x80'.decode('utf-8', 'replace') returns >one U+FFFD and not two. I don't think that is right.

I think that one U+FFFD is correct. The on;y error is a premature end of data. On Thu, May 17, 2012 at 12:31 PM, Serhiy Storchaka <report@bugs.python.org>wrote:

Serhiy Storchaka <storchaka@gmail.com> added the comment:

The only issue left was about the number of U+FFFD generated with invalid sequences in some cases. My last patch has extensive tests for this, so you could try to apply it (or copy the tests) and see if they all pass.

Tests fails, but I'm not sure that the tests are correct.

b'\xe0\x00' raises 'unexpected end of data' and not 'invalid continuation byte'. This is terminological issue.

b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I don't think that is right.

title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 -> str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8271>