[Python-Dev] Bytes path related questions for Guido (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sun Aug 24 17:26:43 CEST 2014


On 25 August 2014 00:23, Antoine Pitrou <antoine at python.org> wrote:

Le 24/08/2014 09:04, Nick Coghlan a écrit :

Serhiy & Ezio convinced me to scale this one back to a proposal for "codecs.cleansurrogateescapes(s)", which replaces surrogates that may be produced by surrogateescape (that's what string.clean() above was supposed to be, but my description was not correct, and the name was too vague for that error to be obvious to the reader)

"clean" conveys the wrong meaning. It should use a scary word such as "trap". "Cleaning" surrogates is unlikely to be the right procedure when dealing with surrogates produced by undecodable byte sequences.

"purge_surrogate_escapes" was the other term that occurred to me.

Either way, my use case is to filter them out when I don't want to pass them along to other software, but would prefer the Unicode replacement character to the ASCII question mark created by using the "replace" filter when encoding.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list