[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)
Toshio Kuratomi a.badger at gmail.com
Wed Apr 29 04:09:42 CEST 2009
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Zooko O'Whielacronx wrote:
On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
If you switch to iso8859-15 only in the presence of undecodable UTF-8, then you have the same round-trip problem as the PEP: both b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a way to unambiguously recover the original file name. Why do you say that? It seems to work as I expected here:
'\xff'.decode('iso-8859-15') u'\xff' '\xc3\xbf'.decode('iso-8859-15') u'\xc3\xbf'
'\xff'.decode('cp1252') u'\xff' '\xc3\xbf'.decode('cp1252') u'\xc3\xbf'
You're not showing that this is a fallback path. What won't work is first trying a local encoding (in the following example, utf-8) and then if that doesn't work, trying a one-byte encoding like iso8859-15:
try: file1 = '\xff'.decode('utf-8') except UnicodeDecodeError: file1 = '\xff'.decode('iso-8859-15') print repr(file1)
try: file2 = '\xc3\xbf'.decode('utf-8') except UnicodeDecodeError: file2 = '\xc3\xbf'.decode('iso-8859-15') print repr(file2)
That prints: u'\xff' u'\xff'
The two encodings can map different bytes to the same unicode code point so you can't do this type of thing without recording what encoding was used in the translation.
-Toshio
-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/972668d1/attachment.pgp>
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]