[Python-Dev] Adding Japanese Codecs to the distro (original) (raw)

Martin v. L�wis martin@v.loewis.de
16 Jan 2003 11:05:55 +0100


"M.-A. Lemburg" <mal@lemburg.com> writes:

Thoughts ?

I'm in favour of adding support for Japanese codecs, but I wonder whether we shouldn't incorporate the C version of the Japanese codecs package instead, despite its size.

I would also suggest that it might be more worthwhile to expose platform codecs, which would give us all CJK codecs on a number of major platforms, with a minimum increase in the size of the Python distribution, and with very good performance.

If Suzuki's code is incorporated, I'd like to get independent confirmation that it is actually correct. I know Tamito has taken many iterations until it was correct, where "correct" is a somewhat fuzzy term, since there are some really tricky issues for which there is no single one correct solution (like whether \x5c is a backslash or a Yen sign, in these encodings). I notice (with surprise) that the actual mapping tables are extracted from Java, through Jython.

I also dislike absence of the cp932 encoding in Suzuki's codecs. The suggestion to equate this to "mbcs" on Windows is not convincing, as a) "mbcs" does not mean cp932 on all Windows installations, and b) cp932 needs to be processed on other systems, too. I think cp932 could be implemented as a delta to shift-jis, as shown in

http://hp.vector.co.jp/authors/VA003720/lpproj/test/cp932sj.htm

(although I wonder why they don't list the backslash issue as a difference between shift-jis and cp932)

Regards, Martin