Rewrite of IBM doublebyte charsets (original) (raw)

Ulf Zibis Ulf.Zibis at gmx.de
Mon May 18 13:54:46 UTC 2009


Am 17.05.2009 23:00, Xueming Shen schrieb:

Ulf Zibis wrote:

*** Encoder-Suggestions:

(26) Why copying String to char[] in initC2B(), String access should be same fast?: - char[] sb = b2cSB.toCharArray(); - char[] db = b2c[i].toCharArray(); -Ulf

because the b2c tables need to be updated before used to generate the c2b tables, if there is a b2cNR table (means there are multiple "bytes" mapped to a single same "char", when do c->b, we need to know which "bytes" to map to, this is done by specified that in .nr map). In theory we need only do that if b2cNR presents, but I don't want to keep two paths. A possible optimization is to pass in char[] instead of String, then only make a copy when necessary.

Oops, yes, it was late after hours of thinking digital.

While thinking, why I didn't have this problem in my code.... I didn't have to manipulate the b2c map, as I transformed all the NR's to the *.irregularities map file, which you called *.c2b, which is in fact an overwriting of the from b2c generated c2b map. (BTW, in *.nr the 2nd value is redundant and could be saved) So if we have 15 --> 000A 25 --> 000A in *.map, instead of 25 (--> 000A) in *.nr, we could have 15 <-- 000A in *.c2b

So avoiding the copying of the whole b2c map should be an additional sincere argument for my suggestion (21), which I must correct:

(21) join *.nr to *.c2b files (25->000a becomes 000a->15): Benefit[21]: reduce no. of files Benefit[22]: simplifies initC2B() (avoids 2 loops + saves copying the whole b2c map)

-Ulf



More information about the core-libs-dev mailing list