Issue 8308: raw_bytes.decode('cp932') -- spurious mappings (original) (raw)
Issue8308
Created on 2010-04-03 23:40 by sjmachin, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (3) | ||
---|---|---|
msg102308 - (view) | Author: John Machin (sjmachin) | Date: 2010-04-03 23:40 |
According to the following references, the bytes 80, A0, FD, FE, and FF are not defined in cp932: http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL However CPython 3.1.2 does this: >>> print(ascii(b'\x80\xa0\xfd\xfe\xff'.decode('cp932'))) '\x80\uf8f0\uf8f1\uf8f2\uf8f3' (as do 2.5, 2.6. and 2.7 with the appropriate syntax) This maps 80 to U+0080 (not very useful) and maps the other 4 bytes into the Private Use Area ("PUA")!! Each case should be treated as undefined/unexpected/error/... | ||
msg102321 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2010-04-04 06:59 |
This mapping is in conformance with the de-facto standard of that encoding, Microsoft Windows, see http://www.autumn.org/etc/unidif.html http://mail.python.org/pipermail/i18n-sig/2003-June/001598.html http://homepage1.nifty.com/nomenclator/perl/ShiftJIS-CP932-MapUTF.html | ||
msg102334 - (view) | Author: John Machin (sjmachin) | Date: 2010-04-04 11:56 |
Thanks, Martin. Issue closed as far as I'm concerned. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:59 | admin | set | github: 52555 |
2010-04-04 13:21:54 | r.david.murray | set | status: open -> closedresolution: wont fixstage: test needed -> resolved |
2010-04-04 11:56:40 | sjmachin | set | messages: + |
2010-04-04 06:59:13 | loewis | set | nosy: + loewismessages: + |
2010-04-03 23:44:02 | ezio.melotti | set | priority: normalnosy: + ezio.melottistage: test needed |
2010-04-03 23:40:16 | sjmachin | create |