msg332285 - (view) |
Author: BLKSerene (blkserene) |
Date: 2018-12-21 10:08 |
There're some minor issues about encodings supported by Python. 1. "tis260" is the alias for "tactis", where "tis260" might be a typo, which should be tis620. And "tactis" is not a supported encoding by Python (and I can't find any information about this encoding on Google). 2. "mac_latin2" and "mac_centeuro" refer to the same encoding (the decoding tables are identical), but they are provided as two encodings in different names ("maccentraleurope" is an alias for "mac_latin2", but "mac_centeuro" isn't). 3. The same problem for "latin_1" and "iso8859_1" ("iso_8859_1" is an alias for "latin_1", but "iso8859_1" isn't). |
|
|
msg333115 - (view) |
Author: Ashwin Ramaswami (epicfaace) * |
Date: 2019-01-06 17:24 |
"iso8859_1" is already an alias for "latin_1", though. https://github.com/python/cpython/blob/master/Lib/encodings/aliases.py#L432 |
|
|
msg336493 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2019-02-25 00:09 |
Removing unused alias is OK. But I'm not sure about adding new alias. In encodings/ package, there are both of mac_centeuro.py and mac_latin2.py. Why alias is needed, without removing mac_centeuro.py? |
|
|
msg336496 - (view) |
Author: BLKSerene (blkserene) |
Date: 2019-02-25 04:36 |
I suppose that mac_centeuro can be removed since it is identical to mac_latin2, and there are already some aliases for mac_latin2. Then, mac_centeuro can be added as an alias for mac_latin2. I'm not sure about why latin_1 and iso8859_1 are both supported (they are identical). The doc says: "CPython implementation detail: Some common encodings can bypass the codecs lookup machinery to improve performance. These optimization opportunities are only recognized by CPython for a limited set of (case insensitive) aliases: utf-8, utf8, latin-1, latin1, iso-8859-1, iso8859-1, mbcs (Windows only), ascii, us-ascii, utf-16, utf16, utf-32, utf32, and the same using underscores instead of dashes. Using alternative aliases for these encodings may result in slower execution." Also not sure whether this would matter or not. |
|
|
msg336497 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2019-02-25 05:12 |
@lemburg I confirmed mac_latin1 and mac_centeuro are identical, even though they are generated from different sources. >>> from encodings import mac_latin2, mac_centeuro >>> mac_latin2.decoding_table == mac_centeuro.decoding_table True How do you think about removing mac_centeuro and adding an alias to mac_latin2? |
|
|
msg344749 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2019-06-05 16:55 |
1. Background for "tactis": https://github.com/python/cpython/commit/4fd73f0465ba11c22f0986d04cf91b387ed22c47 # The codecs for these encodings are not distributed with the # Python core, but are included here for reference, since the # locale module relies on having these aliases available. This codec was available as separate package at the time. Later the CJK codecs got added to the stdlib, but this codec was not. I guess it's fine to remove the alias. 2. If the mappings are identical, just leaving one and making the other an alias is fine. Same for aliases of those mapping names. 3. I think we had already resolved this some time ago. |
|
|
msg344773 - (view) |
Author: Cheryl Sabella (cheryl.sabella) *  |
Date: 2019-06-05 22:18 |
New changeset c4c15ed7a2c7c2a1983e88b89c244d121eb3e512 by Cheryl Sabella (Ashwin Ramaswami) in branch 'master': bpo-35551: encodings update (GH-11446) https://github.com/python/cpython/commit/c4c15ed7a2c7c2a1983e88b89c244d121eb3e512 |
|
|
msg344788 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2019-06-06 05:39 |
New changeset cb65202520e7959196a2df8215692de155bf0cc8 by Inada Naoki in branch 'master': bpo-35551: remove mac_centeuro encoding (GH-13856) https://github.com/python/cpython/commit/cb65202520e7959196a2df8215692de155bf0cc8 |
|
|