[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)
Lino Mastrodomenico l.mastrodomenico at gmail.com
Wed May 6 12:22:50 CEST 2009
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2009/5/6 Antoine Pitrou <solipsis at pitrou.net>:
By the way, what are the ASCII characters that are not suppported by Shift-JIS? Not many I suppose? (if I read the Wikipedia entry correctly, it's only the backslash and the tilde).
The biggest problem with Shift-JIS is that a perfectly valid unicode character above 127 can be encoded to a byte sequence that includes bytes in range(128).
E.g. the character 掛 (a.k.a. '\u639b') when encoded with Shift-JIS becomes the two bytes sequence b'\x8a|'. Notice that the second byte is 124, which on POSIX is usually interpreted as the pipe character and can have security implications.
It's a know problem with Shift-JIS and was fixed in UTF-8.
-- Lino Mastrodomenico
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]