[Python-Dev] Bytes path related questions for Guido (original) (raw)

Antoine Pitrou antoine at python.org
Sun Aug 24 16:23:52 CEST 2014


Le 24/08/2014 09:04, Nick Coghlan a écrit :

On 24 August 2014 14:44, Nick Coghlan <ncoghlan at gmail.com> wrote:

2. Should we add some additional helpers to the string module for dealing with surrogate escaped bytes and other techniques for smuggling arbitrary binary data as text?

My proposal [3] is to add: * string.escapedsurrogates (constant with the 128 escaped code points) * string.clean(s): replaces surrogates with '\ufffd' or another specified code point * string.redecode(s, encoding): encodes a string back to bytes and then decodes it again using the specified encoding (the old encoding defaults to 'latin-1' to match the assumptions in WSGI) Serhiy & Ezio convinced me to scale this one back to a proposal for "codecs.cleansurrogateescapes(s)", which replaces surrogates that may be produced by surrogateescape (that's what string.clean() above was supposed to be, but my description was not correct, and the name was too vague for that error to be obvious to the reader)

"clean" conveys the wrong meaning. It should use a scary word such as "trap". "Cleaning" surrogates is unlikely to be the right procedure when dealing with surrogates produced by undecodable byte sequences.

Regards

Antoine.



More information about the Python-Dev mailing list