JDK 9 RFR of 6375303: Review use of caching in BigDecimal (original) (raw)

Peter Levart peter.levart at gmail.com
Sat Mar 22 09:01:10 UTC 2014


On 03/22/2014 02:01 AM, Brian Burkhalter wrote:

On Mar 20, 2014, at 12:49 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> wrote:

On 03/20/2014 11:06 AM, Peter Levart wrote: I was thinking about last night, for question: "Why is this double-checked non-volatile-then-volatile trick not any faster than pure volatile variant even on ARM platform where volatile read should have some penalty compared to normal read?", might be in the fact that Raspberry Pi is a single-core/single-thread "machine". Would anyone with JVM JIT compiler expertise care to share some insight? I suspect that on such platform, the compiler optimizes volatile accesses so that they are performed without otherwise necessary memory fences...

Yes, at least C2 is known to not emit memory fences on uniprocessor machines. You need to have a multicore ARM. If you are still interested, contact me privately and I can arrange the access to my personal quad-core Cortex-A9. This is all very interesting but I would like to have closure so to speak on the patches that I posted.

Hi Brian,

I think the questions still unresolved are:

1) Is a change of toString() necessary?

I can't speak of that since I don't have empirical data in what proportion of code it would be beneficial. I can imagine situations where returned strings are used as keys in HashMap and String's equals is optimized for same-instance comparison, but I don't know if such situations happen in practice...

2) If #1 is “yes” then which version of the patch? (I think the most recent)

Aleksey Shipilev kindly let me access it's quad-code ARM "Big Iron" and before the connection to it broke yesterday, I managed to get the results of a single-threaded run of the microbenchmark. While differences were not spectacular, they were measurable (sorry I haven't saved the results). The variant with volatile stringCache field and CAS was a little slower than the double-checked nonvolatile-then-volatile-read-and-CAS trick.

It's also important that the fast-path method does not exceed the maximum bytecode size for inlining (35 bytes by default, I think) so the variant with two methods toString/toStringSlow is necessary to avoid performance regression.

3) Are the other changes OK which are unrelated to toString()?

Looks good. Just a nit. In the following method:

3726 private static void matchScale(BigDecimal[] val) { 3727 if (val[0].scale == val[1].scale) { 3728 // return 3729 } else if (val[0].scale < val[1].scale) { 3730 val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY); 3731 } else if (val[1].scale < val[0].scale) { 3732 val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY); 3733 } 3734 }

One of 3 ifs is superfluous. Either the 1st one:

 private static void matchScale(BigDecimal[] val) {
     */* if (val[0].scale == val[1].scale) {
         // return
     } else */*  if (val[0].scale < val[1].scale) {
         val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
     } else if (val[1].scale < val[0].scale) {
         val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
     }
 }

...or the last one:

 private static void matchScale(BigDecimal[] val) {
     if (val[0].scale == val[1].scale) {
         // return
     } else if (val[0].scale < val[1].scale) {
         val[0] = val[0].setScale(val[1].scale, ROUND_UNNECESSARY);
     } else*/* if (val[1].scale < val[0].scale) */*  {
         val[1] = val[1].setScale(val[0].scale, ROUND_UNNECESSARY);
     }
 }

Regards, Peter

Thanks, Brian



More information about the core-libs-dev mailing list