Message 102523 - Python tracker (original) (raw)

I also found out that, according to RFC 3629, surrogates are considered invalid and they can't be encoded/decoded, but the UTF-8 codec actually does it.

Python2 does, but Python3 raises an error. (...)

I wonder how that change got into the 3.x branch - I would certainly not have approved it for the reasons given further up on this ticket.

I think we should revert that change for Python 3.2.

See r72208 and issue #3672.

pitrou wrote "We could fix it for 3.1, and perhaps leave 2.7 unchanged if some people rely on this (for whatever reason)."