Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[]) (original) (raw)

Xueming Shen xueming.shen at oracle.com
Fri Jul 13 17:09:36 UTC 2012


On 07/13/2012 05:19 AM, Alan Bateman wrote:

On 11/07/2012 00:11, Xueming Shen wrote:

Hi,

In JDK7, the decoder and encoder implementation of most of our single-byte charsets and UTF-8 charset are optimized to implement the internal interfce sun.nio.cs.ArrayDecoder/ Encoder to provide a fastpath for String.getBytes(...) and new String(byte[]...) operations. I have an old blog regarding this optimization at https://blogs.oracle.com/xuemingshen/entry/fasternewstringbytescs This rfe, as the followup for above changes, is to implement ArrayDe/Encoder for most of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is the webrev http://cr.openjdk.java.net/~sherman/7183053/webrev I've taken a pass over this and it's great to see DoubleByte.Decoder/Encoder implementing sun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a small number of regressions (Big5 at len=32 for example) but this is a micro benchmark and I'm sure there are fluctuations. I don't see anything obviously wrong with the EBCDIC changes I'd need a history book to remember how the shifts between DBCS and SBCS. I think our tests our good for this area so I'm happy. One minor nit is the continue in both encode methods, I think it would be cleaner to use "else if (bb ..." instead.

The continue might make the vm happy, but this is the code I did last Oct, so I might be wrong. Will give a couple run later with "else"

I see in TestStringCoding.java that you've commented out the test that goes over the buffer limit - would I be correct to say that this isn't an issue and this happens with DB charsets today?

This is also true for utf-8 I did last year, but utf-8 is excluded at the beginning of the test. For SB, it takes the advantage that the output char[] should always be the same as the length of the input bytes, so this can be checked at the very beginning together. For mb, to check both sp and dp slow down the de/encoding (vm obviously does not like too many "if"s). Given this is an internal interface used exclusively by StringCoding, in which it has already calculated the max buf to feed in, I think this is something that can be optimized.

-Sherman

Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained?

-Alan.



More information about the core-libs-dev mailing list