Fast String... (original) (raw)
Ulf Zibis Ulf.Zibis at gmx.de
Wed Mar 25 14:12:36 UTC 2009
- Previous message: hg: jdk7/tl/jdk: 6800572: Removing elements from views of NavigableMap implementations does not always work correctly.
- Next message: hg: jdk7/tl/jdk: 6819122: DefaultProxySelector should lazily initialize the Pattern object and the NonProxyInfo objects
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 25.03.2009 04:41, Xueming Shen schrieb:
Ulf Zibis wrote:
Am 25.03.2009 02:13, Xueming Shen schrieb:
reduce size is a good thing, that was my primary goal, to reduce the charsets.jar to under 2M, and doable if we can put the data outside the class file, that was what I have done...the concern is the startup time. one alternative is to pick this approach for those charsets that don't care the startup, such as the ibm charsets and the one on solaris:-)
compared to stored the data in class file and out of the class, you can still eliminate the c2b data (generated from b2c), the difference is the String constants stored in utf8 probably take 3 bytes but 2 bytes in a ".dat" file....about 15% Your generated charset classes have 2 K in average, my data files have 250 bytes in average (including aliases + historicalName, so you should subtract 50..200 bytes for comparison). See: https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/releases/niocharsetM4.jar?rev=682&view=log it's unfair:-) you put me totally in defensive position:-) Martin can testify i started to sell this idea of extracting all mapping data into dat file and to have only one single base class to load in dat and construct the charset class on the fly, 2 years ago:-) so i know how small it can be. my 15% data is not for singlebyte, i'm talking about the doublebyte,
Ah, ok. This makes it clearer.
I totally agree with you, saving bytes only in singlebyte charsets isn't much worth. But it was good exercise for me, to find out relevant techniques. You may would wonder, how I can serve a coder for 256 2-byte chars with a 69 byte data file (e.g. koi8-u.dat), which also includes it's numerous names. The trick is, that I share map data between charsets, if they are similar enough. This is done by my sun.nio.cs.CharsetStream class.
I would wonder, if there isn't heavy concordance between doublebyte maps, which could be shared. I have designed CharsetStream class to be extendible for doublebyte requirements. Additionally, I think it should be possible to partly share mapping tables in memory, as the doublebyte b2c maps in general seem to be sliced.
The big problem is the lack in startup time, which for me seems to be caused by the dilly-dallying resource stream.
-Ulf
let me explain why i don't really care the singlebyte size, we have probably 100 singlebyte charsets in our repository, assume each takes 2k, it's total of 200k of the 6M +(in stored mode) size of charsets.jar, even you can reduce the size to 0, it's 5% of the total size. yes, each bit counts, but sometime you have to balance the advantage and disadvantage, so if we have to trade the startup for the 5% reduce of total 6M charsets.jar, i would give it a second thought. but it might be a totaly different story for doublebyte, if you can cut the 6M in half (that was my goal), with relatively small startup regression, it might be something worth doing.
Sherman
- Previous message: hg: jdk7/tl/jdk: 6800572: Removing elements from views of NavigableMap implementations does not always work correctly.
- Next message: hg: jdk7/tl/jdk: 6819122: DefaultProxySelector should lazily initialize the Pattern object and the NonProxyInfo objects
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]