[Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido) (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Thu Aug 28 14:26:16 CEST 2014
- Previous message: [Python-Dev] pip enhancement
- Next message: [Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 26 Aug 2014 21:34, "MRAB" <python at mrabarnett.plus.com> wrote:
On 2014-08-26 03:11, Stephen J. Turnbull wrote:
Nick Coghlan writes: > "purgesurrogateescapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralizesurrogateescapes(s, remove=False, replacement='\uFFFD') How about: replacesurrogateescapes(s, replacement='\uFFFD') If you want them removed, just pass an empty string as the replacement.
The current proposal on the issue tracker is to instead take advantage of the existing error handlers:
def convert_surrogateescape(data, errors='replace'):
return data.encode('utf-8', 'surrogateescape').decode('utf-8',
errors)
That code is short, but semantically dense - it took a few iterations to come up with that version. (Added bonus: once you're alerted to the possibility, it's trivial to write your own version for existing Python 3 versions. The standard name just makes it easier to look up when you come across it in a piece of code, and provides the option of optimising it later if it ever seems worth the extra work)
I also filed a separate RFE to make backslashreplace usable on input, since that allows the option of separating the replacement operation from the encoding operation.
Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140828/97f55de8/attachment.html>
- Previous message: [Python-Dev] pip enhancement
- Next message: [Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]