Issue 7090: encoding uncode objects greater than FFFF (original) (raw)
Issue7090
Created on 2009-10-09 09:12 by msaghaei, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (2) | ||
---|---|---|
msg93780 - (view) | Author: Mahmoud (msaghaei) | Date: 2009-10-09 09:12 |
Odd behaviour with str.encode or codecs.Codec.encode or simailar functions, when dealing with uncode objects above ffff with 2.6 >>> u'\u10380'.encode('utf') '\xe1\x80\xb80' with 3.x '\u10380'.encode('utf') '\xe1\x80\xb80' correct output must be: \xf0\x90\x8e\x80 | ||
msg93781 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2009-10-09 09:16 |
If you want to specify codepoints greater than U+FFFF you have to use u'\Uxxxxxxxx': >>> x = u'\u10380' >>> x.encode('utf-8') '\xe1\x80\xb80' >>> x[0] u'\u1038' >>> x[1] u'0' >>> y = u'\U00010380' >>> y.encode('utf-8') '\xf0\x90\x8e\x80' |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:53 | admin | set | github: 51339 |
2009-10-09 09:16:49 | ezio.melotti | set | status: open -> closednosy: + ezio.melottimessages: + resolution: not a bugstage: resolved |
2009-10-09 09:12:33 | msaghaei | create |