[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)
Zooko O'Whielacronx zooko at zooko.com
Tue Apr 28 20:51:43 CEST 2009
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
Are you proposing to unconditionally encode file names as iso8859-15, or to do so only when undecodeable bytes are encountered?
For what it is worth, what we have previously planned to do for the
Tahoe project is the second of these -- decode using some 1-byte
encoding such as iso-8859-1, iso-8859-15, or windows-1252 only in the
case that attempting to decode the bytes using the local alleged
encoding failed.
If you switch to iso8859-15 only in the presence of undecodable UTF-8, then you have the same round-trip problem as the PEP: both b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a way to unambiguously recover the original file name.
Why do you say that? It seems to work as I expected here:
'\xff'.decode('iso-8859-15') u'\xff' '\xc3\xbf'.decode('iso-8859-15') u'\xc3\xbf'
'\xff'.decode('cp1252') u'\xff' '\xc3\xbf'.decode('cp1252') u'\xc3\xbf'
Regards,
Zooko
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]