[Python-Dev] Adding Japanese Codecs to the distro (original) (raw)
M.-A. Lemburg mal@lemburg.com
Thu, 16 Jan 2003 12:22:58 +0100
- Previous message: [Python-Dev] Adding Japanese Codecs to the distro
- Next message: [Python-Dev] Adding Japanese Codecs to the distro
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. L=F6wis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes: =20
Thoughts ? =20 I'm in favour of adding support for Japanese codecs, but I wonder whether we shouldn't incorporate the C version of the Japanese codecs package instead, despite its size.
I was suggesting to make Suzuki's codecs the default. That doesn't prevent Tamito's codecs from working, since these are inside a package.
If someone wants the C codecs, we should provide them as separate download right alongside of the standard distro (as discussed several times before).
Note that the C codecs are not as easy to modify to special needs as the Python ones. While this may seem unnecessary I've heard from a few people that especially companies tend to extend the mappings with their own set of company specific code points.
I would also suggest that it might be more worthwhile to expose platform codecs, which would give us all CJK codecs on a number of major platforms, with a minimum increase in the size of the Python distribution, and with very good performance.
+1
We already have this on Windows (via the mbcs codec). If you could contribute your iconv codecs under the PSF license we'd go a long way in that direction on Unix as well.
If Suzuki's code is incorporated, I'd like to get independent confirmation that it is actually correct.=20
Since he built the codecs on the mappings in Java, this looks like enough third party confirmation already.
I know Tamito has taken many iterations until it was correct, where "correct" is a somewhat fuzzy term, since there are some really tricky issues for which there is no single one correct solution (like whether \x5c is a backslash or a Yen sign, in these encodings). I notice (with surprise) that the actual mapping tables are extracted from Java, through Jython.
Indeed. I think that this kind of approach is a good one in the light of the "correctness" problems you mention above. It also helps with the compatibility side.
I also dislike absence of the cp932 encoding in Suzuki's codecs. The suggestion to equate this to "mbcs" on Windows is not convincing, as a) "mbcs" does not mean cp932 on all Windows installations, and b) cp932 needs to be processed on other systems, too. I think cp932 could be implemented as a delta to shift-jis, as shown in =20 http://hp.vector.co.jp/authors/VA003720/lpproj/test/cp932sj.htm =20 (although I wonder why they don't list the backslash issue as a difference between shift-jis and cp932)
As always: contributions are welcome :-)
--=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
- Previous message: [Python-Dev] Adding Japanese Codecs to the distro
- Next message: [Python-Dev] Adding Japanese Codecs to the distro
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]