Issue 7090: encoding uncode objects greater than FFFF (original) (raw)

Issue7090

Created on 2009-10-09 09:12 by msaghaei, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg93780 - (view) Author: Mahmoud (msaghaei) Date: 2009-10-09 09:12
Odd behaviour with str.encode or codecs.Codec.encode or simailar functions, when dealing with uncode objects above ffff with 2.6 >>> u'\u10380'.encode('utf') '\xe1\x80\xb80' with 3.x '\u10380'.encode('utf') '\xe1\x80\xb80' correct output must be: \xf0\x90\x8e\x80
msg93781 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-10-09 09:16
If you want to specify codepoints greater than U+FFFF you have to use u'\Uxxxxxxxx': >>> x = u'\u10380' >>> x.encode('utf-8') '\xe1\x80\xb80' >>> x[0] u'\u1038' >>> x[1] u'0' >>> y = u'\U00010380' >>> y.encode('utf-8') '\xf0\x90\x8e\x80'
History
Date User Action Args
2022-04-11 14:56:53 admin set github: 51339
2009-10-09 09:16:49 ezio.melotti set status: open -> closednosy: + ezio.melottimessages: + resolution: not a bugstage: resolved
2009-10-09 09:12:33 msaghaei create