[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Toshio Kuratomi a.badger at gmail.com
Wed Apr 29 04:39:20 CEST 2009


Martin v. Löwis wrote:

Since the serialization of the Unicode string is likely to use UTF-8, and the string for such a file will include half surrogates, the application may raise an exception when encoding the names for a configuration file. These encoding exceptions will be as rare as the unusual names (which the careful I18N aware developer has probably eradicated from his system), and thus will appear late. There are trade-offs to any solution; if there was a solution without trade-offs, it would be implemented already. The Python UTF-8 codec will happily encode half-surrogates; people argue that it is a bug that it does so, however, it would help in this specific case.

Can we use this encoding scheme for writing into files as well? We've turned the filename with undecodable bytes into a string with half surrogates. Putting that string into a file has to turn them into bytes at some level. Can we use the python-escape error handler to achieve that somehow?

-Toshio

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/18c8fa55/attachment.pgp>



More information about the Python-Dev mailing list