Rewrite of IBM doublebyte charsets (original) (raw)
Ulf Zibis Ulf.Zibis at gmx.de
Mon May 18 13:54:46 UTC 2009
- Previous message: Rewrite of IBM doublebyte charsets
- Next message: Rewrite of EUC_TW
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 17.05.2009 23:00, Xueming Shen schrieb:
Ulf Zibis wrote:
*** Encoder-Suggestions:
(26) Why copying String to char[] in initC2B(), String access should be same fast?: - char[] sb = b2cSB.toCharArray(); - char[] db = b2c[i].toCharArray(); -Ulf
because the b2c tables need to be updated before used to generate the c2b tables, if there is a b2cNR table (means there are multiple "bytes" mapped to a single same "char", when do c->b, we need to know which "bytes" to map to, this is done by specified that in .nr map). In theory we need only do that if b2cNR presents, but I don't want to keep two paths. A possible optimization is to pass in char[] instead of String, then only make a copy when necessary.
Oops, yes, it was late after hours of thinking digital.
While thinking, why I didn't have this problem in my code.... I didn't have to manipulate the b2c map, as I transformed all the NR's to the *.irregularities map file, which you called *.c2b, which is in fact an overwriting of the from b2c generated c2b map. (BTW, in *.nr the 2nd value is redundant and could be saved) So if we have 15 --> 000A 25 --> 000A in *.map, instead of 25 (--> 000A) in *.nr, we could have 15 <-- 000A in *.c2b
So avoiding the copying of the whole b2c map should be an additional sincere argument for my suggestion (21), which I must correct:
(21) join *.nr to *.c2b files (25->000a becomes 000a->15): Benefit[21]: reduce no. of files Benefit[22]: simplifies initC2B() (avoids 2 loops + saves copying the whole b2c map)
-Ulf
- Previous message: Rewrite of IBM doublebyte charsets
- Next message: Rewrite of EUC_TW
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]