Issue 854511: Thai encoding alias for 'cp874' (original) (raw)

Created on 2003-12-05 03:16 by kamthorn, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-cvs-thai-encoding-alias-2.diff kamthorn,2003-12-05 16:15 patch for add Thai encoding aliases
Messages (7)
msg54076 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 03:16
I suggest adding 'tis_620', 'ibm874', 'iso_8859_11', 'iso8859_11', 'windows-874' as alias to 'cp874' to encodings/aliases.py.
msg54077 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 05:00
Logged In: YES user_id=143334 sorry, 'windows_874' not 'windows-874'
msg54078 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 08:46
Logged In: YES user_id=143334 This patch is for add Thai encoding aliases to aliases.py
msg54079 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2003-12-05 09:57
Logged In: YES user_id=38388 Thanks for the suggestion. Before we can add the aliases we do however need a reference which clearly says that these codec names all refer to the same encoding as cp874, esp. since you seem to have a typo in tis_620 ... the only reference I could find mentioned tis_602.
msg54080 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 16:11
Logged In: YES user_id=143334 refer to the page http://linux.thai.net/thep/mlit/countries.html There are only two Thai character encoding standard; 'tis-620' and 'iso-8859-11'. The former is under Thai Industrial Standards Institute (http://www.tisi.go.th/). You can see details in http://www.inet.co.th/cyberclub/trin/thairef/tis620-iso10646.html The later is under ISO (http://anubis.dkuug.dk/JTC1/SC2/open/02n3333.pdf). The both of them refer to code page 874. There are some non-standard Thai character encoding, refer to code page 874. These are 'windows-874' 'ibm874' 'x-mac-thai' 'tactis' (adds x-mac-thai and tactis) The name of Thai character encoding is tis-620 not tis-602 as you mentioned. Summary: - 'tis620', 'tis_620', 'ibm874', 'iso_8859_11', 'iso8859_11', 'windows-874', 'x-mac-thai', 'tactis' should alias to 'cp874' Additional, I found 'tis260' alias to 'tactis' in aliases.py, I sure 'tis260' is typo and 'tactis' is missing. I suggest remove it. (please see my update patch)
msg54081 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-01-18 09:38
Logged In: YES user_id=21627 The code sets should not alias. In CP 874, \80 is EURO SIGN. In TIS 620, it is (apparently) unassigned (same for all other characters in the range \x80..\xa0). IOW, CP 874 is a superset of TIS 620. Closing the request as rejected.
msg300976 - (view) Author: (era) Date: 2017-08-29 08:48
Closing the entire enhancement request just because one detail is off seems insane. Anyway, until the day in the distant future when Python can support encoding names in common circulation, http://stackoverflow.com/a/1064191/874188 offers a crude workaround. import encodings if 'windows_874' not in encodings.aliases.aliases: encodings.aliases.aliases['windows_874'] = 'cp874' This is tricky in a number of ways; in practice, this snippet needs to be at the very start of your source file. Also, the underscore is correct even for email encoding names like =?windows-874?Q?hello=3F?= which use a dash (the dash gets remapped to underscore internally when looking up the encoding alias).
History
Date User Action Args
2022-04-11 14:56:01 admin set github: 39666
2017-08-29 08:48:02 era set nosy: + eramessages: +
2003-12-05 03:16:34 kamthorn create