Issue 24041: Implement Mac East Asian encodings properly (original) (raw)

Created on 2015-04-23 18:52 by Behdad.Esfahbod, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg241876 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-23 18:52
encodings.aliases has this in it's tail, even master today [0] # temporary mac CJK aliases, will be replaced by proper codecs in 3.1 'x_mac_japanese' : 'shift_jis', 'x_mac_korean' : 'euc_kr', 'x_mac_simp_chinese' : 'gb2312', 'x_mac_trad_chinese' : 'big5', A full implementation is appreciated. [0] https://github.com/python/cpython/blob/master/Lib/encodings/aliases.py#L539
msg241877 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-23 18:54
Also, I'm not sure about the 'x_' prefix. It's not kept for the other mac encodings. There's a useful table here: https://github.com/behdad/fonttools/issues/236
msg241924 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-24 08:18
The "x_" prefix was added as reminder and way to document the desire to look into this at some point: https://github.com/python/cpython/commit/c696b47b10db1fa22b77ecfe1af392b3d62aab61 Before adding more codecs, we always ask whether these are in actual use. Can you provide some evidence of this ? We will also need official references to the definitions of the Mac encodings. Thanks.
msg241926 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-24 08:34
Thanks Marc-Andre. If the x_ was indeed added for that reason, it's quite a coincidence, because the MIME name of these encodings also starts with x-mac-..., so I assumed that's where the x_ comes from. The mappings are available at the Unicode website: http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINTRAD.TXT http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT As for actual use, they are part of the OpenType standard. So by user request, I had to implement them last week in the FontTools Python library. This is useful for people when dealing with old and legacy fonts, specially in the process of converting them to Unicode-compatible fonts.
msg241928 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-24 09:07
On 24.04.2015 10:34, Behdad Esfahbod wrote: > > Thanks Marc-Andre. If the x_ was indeed added for that reason, it's quite a coincidence, because the MIME name of these encodings also starts with x-mac-..., so I assumed that's where the x_ comes from. Oh, I didn't know that :-) Hmm, I can't find the names listed as IANA charset, so the "x-" prefix then probably means non-standard. http://www.iana.org/assignments/character-sets/character-sets.xhtml > The mappings are available at the Unicode website: > http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT > http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINTRAD.TXT > http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT > http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT > > As for actual use, they are part of the OpenType standard. So by user request, I had to implement them last week in the FontTools Python library. This is useful for people when dealing with old and legacy fonts, specially in the process of converting them to Unicode-compatible fonts. This may be an indication that it's better to put those codecs into a PyPI package, rather than Python itself. The above tables are huge (as most Asian codec tables).
msg241969 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-24 18:34
They are a rather minor change on top of the existing Asian encodings. So implementing them in Python might be easier. I have a half-done version of those. I can try finishing and post it back here.
msg241970 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-24 18:35
On 24.04.2015 20:34, Behdad Esfahbod wrote: > > They are a rather minor change on top of the existing Asian encodings. So implementing them in Python might be easier. I have a half-done version of those. I can try finishing and post it back here. If it's only a smaller patch, that would work fine, I guess.
History
Date User Action Args
2022-04-11 14:58:16 admin set github: 68229
2015-04-24 18:35:15 lemburg set messages: +
2015-04-24 18:34:01 Behdad.Esfahbod set messages: +
2015-04-24 09:07:52 lemburg set messages: +
2015-04-24 08:34:56 Behdad.Esfahbod set messages: +
2015-04-24 08🔞13 lemburg set messages: +
2015-04-23 20:17:41 ned.deily set nosy: + lemburg, hyeshik.chang
2015-04-23 18:54:01 Behdad.Esfahbod set messages: +
2015-04-23 18:52:10 Behdad.Esfahbod create