Rewrite of IBM doublebyte charsets (original) (raw)
Ulf Zibis Ulf.Zibis at gmx.de
Tue May 19 00:37:38 UTC 2009
- Previous message: Rewrite of IBM doublebyte charsets
- Next message: Rewrite of IBM doublebyte charsets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 14.05.2009 23:38, Xueming Shen schrieb:
Ulf,
There are 3 goals of this re-writing (1)shrink the storage size of the EUCTW to a reasonable number (2)move away from hard-coding the mapping data in the source file to a mapping based-build time built approach for easy maintenance in the future. (3)no regression on decoding, encoding performance, decoder startup and resulting CoderResult when compared to the existing implementation, with the exception of encoder startup (we need to build it from the b2c). So far I'm happy to see all of them are archived. I'm not targeting to have a perfect one (actually the purpose of goal of (2) is to make it easier for future tuning.).
Yes, the map files are good start point for future tuning.
I would not try to argue which cr is more appropriate, unmappable or malformed, it's hard to draw the line, some codepage/charset set leave some codepoint for future use, private use, user-defined characters, you can't not make the decision based on simply looking at the mapping table, you need to have a standard on your desk to check segment by segment, and in fact personally I don't think it really makes too much sense to distinguish these two. So I would like to follow the existing behavior, is possible.
Mainly I agree with you and I guess, most users don't care about this difference, so the wouldn't run into compatibility problems, if only checking CoderResult#isError(), but I think, that users, who are interested in this difference, they should get most accurate results, regardless, if former implementations have been malicious.
Hope, you are inspired by my suggestions from yesterday ;-)
-Ulf
- Previous message: Rewrite of IBM doublebyte charsets
- Next message: Rewrite of IBM doublebyte charsets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]