[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Thu May 7 07:43:30 CEST 2009
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Michael Urman wrote:
On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" <martin at v.loewis.de> wrote:
Despite there being also an error handler called "surrogates". Not that I have to be, but I'm not sold on the previous UTF-8 codec behavior becoming an error handler of the name "surrogates" for two reasons (I do respect the obvious PBP argument for the implementation, and have no better name - "lenient"?).
PBP?
First, unless there's a way to stack error handlers, there's no way to access the old behavior combined with the "replace" handler.
Well, there is a way to stack error handlers, although it's not pretty:
_surrogates = codecs.lookup_errors("surrogates") _replace = codecs.lookup_errors("replace") def surrogates_then_replace(exc): try: return _surrogates(exc) except UnicodeError: return _replace(exc) codecs.register_error("surrogates_then_replace", surrogates_then_replace)
The stacking argument also applies to the new utf8b behavior on encode (only, as it handles all errors on decode). This may be a YAGNI
Indeed - in particular, as, in the primary application of this error handler (i.e. file IO operations), there is no way of specifying an addition error handler anyway.
Regards, Martin
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]