[Python-Dev] (Not) delaying the 3.2 release (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Fri Sep 17 15:40:15 CEST 2010
- Previous message: [Python-Dev] (Not) delaying the 3.2 release
- Next message: [Python-Dev] (Not) delaying the 3.2 release
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Sep 17, 2010 at 5:43 AM, Martin (gzlist) <gzlist at googlemail.com> wrote:
In the example I gave, 十 encodes in CP932 as '\x8f\', and the function gets confused by the second byte. Obviously the right answer there is just to use unicode, rather than write a function that works with weird multibyte codecs.
That does make it clear that "ASCII superset" is an inaccurate term - a better phrase is "ASCII compatible", since that correctly includes multibyte codecs like UTF-8 which explicitly ensure that the byte values in multibyte characters are all outside the 0x00 to 0x7F range of ASCII.
So the domain of any polymorphic text manipulation functions we define would be:
- Unicode strings
- byte sequences where the encoding is either:
- a single byte ASCII superset (e.g. iso-8859-, cp1252, koi8, mac*)
- an ASCII compatible multibyte encoding (e.g. UTF-8, EUC-JP)
Passing in byte sequences that are encoded using an ASCII incompatible multibyte encoding (e.g. CP932, UTF-7, UTF-16, UTF-32, shift-JIS, big5, iso-2022-*, EUC-CN/KR/TW) or a single byte encoding that is not an ASCII superset (e.g. EBCDIC) will have undefined results.
I think that's still a big enough win to be worth doing, particularly as more and more of the other variable width multibyte encodings are phased out in favour of UTF-8.
Cheers, Nick.
P.S. Hey Barry, is there anyone at Canonical you can poke about https://bugs.launchpad.net/xorg-server/+bug/531208? Tinkering with this stuff on Kubuntu would be significantly less annoying if I could easily type arbitrary Unicode characters into Konsole ;)
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message: [Python-Dev] (Not) delaying the 3.2 release
- Next message: [Python-Dev] (Not) delaying the 3.2 release
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]