Issue 12742: Add support for CESU-8 encoding (original) (raw)
Issue12742
Created on 2011-08-12 14:01 by moese, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (4) | ||
---|---|---|
msg141958 - (view) | Author: Moese (moese) | Date: 2011-08-12 14:01 |
CESU-8 is identical with UTF-8 except that it has a different encoding format for surrogate characters. http://en.wikipedia.org/wiki/CESU-8 It is used by some web APIs. | ||
msg143020 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2011-08-26 16:47 |
Can you provide some example? The page you linked says "It should be used exclusively for internal processing and never for external data exchange.", so I'm not sure why these APIs would want to use it. | ||
msg143138 - (view) | Author: Moese (moese) | Date: 2011-08-29 11:50 |
It's an internal web API at the place I work for. To be able to use it from Python in some form, I did an workaround in which I just stripped everything outside BMP: # replace characters outside BMP with 'REPLACEMENT CHARACTER' (U+FFFD) def cesu8_to_utf8(text): ....result = "" ....index = 0 ....length = len(text) ....while index < length: ........if text[index] < "\xf0": ............result += text[index] ............index += 1 ........else: ............result += "\xef\xbf\xbd" # u"\ufffd".encode("utf8") ............index += 4 ....return result Now that I look at the workaround again, I'm not even sure it's about CESU-8 (it strips Unicode chars encoded to 4 bytes, not 2 pairs of 3 bytes surrogates). However I can see why there would be little interest in adding this encoding. | ||
msg143139 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2011-08-29 12:22 |
I'm going to reject this. If people need it, they can always implement it using the codecs module. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:20 | admin | set | github: 56951 |
2011-08-29 12:22:30 | ezio.melotti | set | status: open -> closedresolution: rejectedmessages: + stage: resolved |
2011-08-29 11:50:11 | moese | set | messages: + |
2011-08-26 16:47:45 | ezio.melotti | set | nosy: + ezio.melottimessages: + |
2011-08-12 17:32:14 | eric.araujo | set | nosy: + lemburgcomponents: + Library (Lib)versions: + Python 3.3, - Python 3.4 |
2011-08-12 14:01:38 | moese | create |