[Python-Dev] Bytes path related questions for Guido (original) (raw)

Walter Dörwald walter at livinglogic.de
Fri Aug 29 12:09:54 CEST 2014


On 28 Aug 2014, at 19:54, Glenn Linderman wrote:

On 8/28/2014 10:41 AM, R. David Murray wrote:

On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman <v+python at g.nevcal.com> wrote: [...] Also for cases where the data stream is supposed to be in a given encoding, but contains undecodable bytes. Showing the stuff that incorrectly decodes as whatever it decodes to is generally what you want in that case. Sure, people can learn to recognize mojibake for what it is, and maybe even learn to recognize it for what it was intended to be, in limited domains. But suppressing/replacing the surrogates doesn't help with that... would it not be better to replace the surrogates with an escape sequence that shows the original, undecodable, byte value? Like \xNN ?

For that we could extend the "backslashreplace" codec error callback, so that it can be used for decoding too, not just for encoding. I.e.

b"a\xffb".decode("utf-8", "backslashreplace")

would return

"a\\xffb"

Servus, Walter



More information about the Python-Dev mailing list