Review request for 6636323_6636319 (original) (raw)

Xueming Shen Xueming.Shen at Sun.COM
Fri Mar 20 18:25:24 UTC 2009


The change has been/is being reviewed by Alan and Ulf, sent to the alias to see if anyone else is interested to take a look (Ulf suggested we should go more open:-)

http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev

(a)It's a pure performance improvement for new String(byte[], cs/csn) and String.getBytes(cs/can) when data size is relatively small. (We're not ready to add those methods into CharsetDe/Encoder for now)

The preliminary micro-benchmark data is at http://cr.openjdk.java.net/~sherman/6636323_6636319/benchmark.txt (b)This is now for ASCII, 8859-1 and all SingleByte based charsets. I yet to find time to migrate the DB charsets.

Thanks, Sherman

Simple writeup for the changes.

Problem/Issue to solve: StringCoding.java is "slow" and create "too many" objects when doing byte[]<->String conversion.

Root Cause: There are "too many" layers and logic before the byte[]/char[] can reach the real de/encoding code and then going back. A pair of ByteBuffer/CharBuffer is always created (the wrapper) for each conversion. While the GC should be doing pretty good these days to clean these wrapper objects quickly, the "creating" and "cleaning" itself are still a waste of CPU/memory resource, if not really necessary.

Two "facts/details" that we can take advantage of:

(1) StringCoding always perform REPLACE when having malformed or unmappable input sequences. (2) The input and output byte/char[] are totally under our "control", the de/encoding should never "overflow"

Changes:

(1) 2 new internal interfaces sun.nio.cs.ArrayDecoder,
sun.nio.cs.ArrayEncoder to provide the byte[] <->char[] fastpath from otherwise "well-encapsulated-X-Buffer only" CharsetDe/Encoder interface.

(2)US_ASCII/ISO_8859_1/SingleByte.Decoder/Encoder to implement above interface

(3)US_ASCII/ISO_8859_1/SingleByte.Decoder/Encoder also to override isLegalReplacement() which improve new CharsetEncoder() significantly, which has big impact to getBytes(charset).

(4)StringCoding.java a)Use ArrayDecoder/ArrayEncoder interface if possible (instanceof) b)Added a "isTrusted" field to indicate the charset is from the system class loader during creating the StringDe/Encoder(invoking cs.getClass().getClassLoader0() is "expensive", to pay the cost everytime len==ba.length, when there is a SM installed, is unnecessary, it helps the benchmark lot when SM installed) c)No longer create StringDe/Encoder to in "param is charset" cases and avoid defensive copy if not necessary.



More information about the core-libs-dev mailing list