Message 102523 - Python tracker (original) (raw)
I also found out that, according to RFC 3629, surrogates are considered invalid and they can't be encoded/decoded, but the UTF-8 codec actually does it.
Python2 does, but Python3 raises an error. (...)
I wonder how that change got into the 3.x branch - I would certainly not have approved it for the reasons given further up on this ticket.
I think we should revert that change for Python 3.2.
pitrou wrote "We could fix it for 3.1, and perhaps leave 2.7 unchanged if some people rely on this (for whatever reason)."