[Python-Dev] Bytes path related questions for Guido (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue Aug 26 04:11:31 CEST 2014


Nick Coghlan writes:

"purge_surrogate_escapes" was the other term that occurred to me.

"purge" suggests removal, not replacement. That may be useful too.

neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD')

maybe? (Of course the remove argument is feature creep, so I'm only about +0.5 myself. And the name is long, but I can't think of any better synonyms for "make safe" in English right now).

Either way, my use case is to filter them out when I don't want to pass them along to other software, but would prefer the Unicode replacement character to the ASCII question mark created by using the "replace" filter when encoding.

I think it would be preferable to be unicodely correct here by default, since this is a str -> str function.



More information about the Python-Dev mailing list