[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)
Michael Urman murman at gmail.com
Thu May 7 16:31:11 CEST 2009
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, May 7, 2009 at 01:16, "Martin v. Löwis" <martin at v.loewis.de> wrote:
I'm still at a loss what name to give it, though. I understand that I have to rename both error handlers, but I'm uncertain what I should rename them to. So proposals that rename only one of them aren't that helpful. It would be helpful if people would indicate support for Antoine's proposal.
Part of the problem is they both allow byte sequences to decode to invalid Unicode strings, and in particular they both affect the same byte subsequences, and that brought us to the crossroads where we wanted to name both of them "surrogates". So I'll offer a few more colors, and try to get out of the way of choosing between them or the other proposed ones. :)
I haven't come up with anything I like better than errors="lenient" for the old utf8 behavior handler; would errors="nonvalidating" be correct? It still seems to me that a new codec, perhaps "utf8-lenient", reads better.
For the utf8b error handler, I could see any of errors="roundtrip", errors="roundtripreplace", errors="tosurrogate", errors="surrogatereplace", errors="surrogateescape", errors="binaryreplace", errors="binaryescape". This includes Antoine's proposal (sans hyphen).
-- Michael Urman
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]