GetPrimitiveArrayCritical vs GetByteArrayRegion: 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream (original) (raw)

Ian Rogers irogers at google.com
Thu Mar 15 01:00:13 UTC 2018


An old data point on how large a critical region should be comes from java.nio.Bits. In JDK 9 the code migrated into unsafe, but in JDK 8 the copies within a critical region were bound at most copying 1MB: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/native/java/nio/Bits.c#l88 This is inconsistent with Deflater and ObjectOutputStream which both allow unlimited arrays and thereby critical region sizes.

In JDK 9 the copies starve the garbage collector in nio Bits too as there is no 1MB limit to the copy sizes: http://hg.openjdk.java.net/jdk/jdk/rev/f70e100d3195 which came from: https://bugs.openjdk.java.net/browse/JDK-8149596

Perhaps this is a regression not demonstrated due to the testing challenge. There is a time to safepoint discussion thread related to this here: https://groups.google.com/d/msg/mechanical-sympathy/f3g8pry-o1A/x6NptTDslcIJ "silent killer" It doesn't seem unreasonable to have the loops for the copies occur in 1MB chunks but JDK-8149596 lost this and so I'm confused on what the HotSpot stand point is. In a way criticals are better than unsafe as they may pin the memory and not starve GC, which shenandoah does.

Thanks, Ian

On Wed, Mar 7, 2018 at 10:16 AM Ian Rogers <irogers at google.com> wrote:

Thanks Martin! Profiling shows most of the time spent in this code is in the call to libz's deflate. I worry that increasing the buffer size increases that work and holds the critical lock for longer. Profiling likely won't show this issue as there's needs to be contention on the GC locker.

In HotSpot: http://hg.openjdk.java.net/jdk/jdk/file/2854589fd853/src/hotspot/share/gc/shared/gcLocker.hpp#l34 "Avoid calling these if at all possible" could be taken to suggest that JNI critical regions should also be avoided if at all possible. I think HotSpot and the JDK are out of step if this is the case and there could be work done to remove JNI critical regions from the JDK and replace either with Java code (JITs are better now) or with Get/Set...ArrayRegion. This does appear to be a O(1) to O(n) transition so perhaps the HotSpot folks could speak to it. Thanks, Ian

On Tue, Mar 6, 2018 at 6:44 PM Martin Buchholz <martinrb at google.com> wrote: Thanks Ian and Sherman for the excellent presentation and memories of ancient efforts.

Yes, Sherman, I still have vague memory that attempts to touch any implementation detail in this area was asking for trouble and someone would complain. I was happy to let you deal with those problems! There's a continual struggle in the industry to enable more checking at test time, and -Xcheck:jni does look like it should be possible to routinely turn on for running all tests. (Google tests run with a time limit, and so any low-level performance regression immediately causes test failures, for better or worse) Our problem reduces to accessing a primitive array slice from native code. The only way to get O(1) access is via GetPrimitiveArrayCritical, BUT when it fails you have to pay for a copy of the entire array. An obvious solution is to introduce a slice variant GetPrimitiveArrayRegionCritical that would only degrade to a copy of the slice. Offhand that seems relatively easy to implement though we would hold our noses at adding yet more Critical functions to the JNI spec. In spirit though it's a straightforward generalization. Implementing Deflater in pure Java seems very reasonable and we've had good success with "nearby" code, but we likely cannot reuse the GNU Classpath code. Thanks for pointing out JDK-6311046: -Xcheck:jni should support checking of GetPrimitiveArrayCritical which went into jdk8 in u40. We can probably be smarter about choosing a better buffer size, e.g. in ZipOutputStream. Here's an idea: In code like this try (DeflaterOutputStream dout = new DeflaterOutputStream(deflated)) { dout.write(inflated, 0, inflated.length); } when the DeflaterOutputStream is given an input that is clearly too large for the current buffer size, reorganize internals dynamically to use a much bigger buffer size. It's possible (but hard work!) to adjust algorithms based on whether critical array access is available. It would be nice if we could get the JVM to tell us (but it might depend, e.g. on the size of the array).



More information about the hotspot-dev mailing list