[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)

MRAB google at mrabarnett.plus.com
Wed May 6 12:08:45 CEST 2009

Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

M.-A. Lemburg wrote:

Martin v. Löwis wrote:

The name "utf8b" suggested in the PEP is not in line with the codec design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). Martin, I designed the whole Python codec machinery, so even if this is not explicitly written down somewhere, you can take my word for it. I don't want users to be confused by such an error handler name, so please change it ! Here's a list of the currently available error handlers (taken from codecs.py): The .encode()/.decode() methods may use different error handling schemes by providing the errors argument. These string values are predefined: 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the builtin Unicode codecs on decoding and '?' on encoding. 'xmlcharrefreplace' - Replace with the appropriate XML character reference (only for encoding). 'backslashreplace' - Replace with backslashed escape sequences (only for encoding). The set of allowed values can be extended via registererror.

Error handlers and codecs are two different things, so the namespaces need to be clearly separate. They are separate naemspaces; that's guaranteed by the implementation. In the implementation, yes, but not in the head of a typical user: the 'utf8b' looks more like a codec name than an error handler name. Judging by the existing names, I think that 'surrogate' would be reasonable. It already contains the meaning of substitute, it's not too long, and the codes which act as replacements are already called surrogates.

I want to avoid any such confusion with Python codecs and don't understand why you are making a problem out of this.

Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list