RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224 (original) (raw)
Daniel D. Daugherty daniel.daugherty at oracle.com
Mon Apr 30 15:57:45 UTC 2018
- Previous message: RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224
- Next message: RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/30/18 11:53 AM, Erik Österlund wrote:
Hi Dan,
On 2018-04-30 17:36, Daniel D. Daugherty wrote: On 4/30/18 9:32 AM, Erik Österlund wrote:
Hi,
It looks like I broke jniFastGetField on SPARC with my changes to modularize this feature (8200235). Sorry about that. And the backout of that (8202386) didn't go in due to maintenance. So here is a patch to fix the problem instead. Webrev: http://cr.openjdk.java.net/~eosterlund/8202381/webrev.00/ src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembleraarch64.cpp src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembleraarch64.hpp src/hotspot/cpu/aarch64/jniFastGetFieldaarch64.cpp Addition of the placeholder 'jnienv'; you went to the trouble of changing 'robj' -> 'obj' on the other platforms, but not here which is inconsistent. Fixed. I changed it here too. Full: http://cr.openjdk.java.net/~eosterlund/8202381/webrev.01/ Incremental: http://cr.openjdk.java.net/~eosterlund/8202381/webrev.0001/ src/hotspot/cpu/sparc/gc/shared/barrierSetAssemblersparc.cpp src/hotspot/cpu/sparc/gc/shared/barrierSetAssemblersparc.hpp Addition of the placeholder 'jnienv'; why rename 'robj' -> 'obj'? src/hotspot/cpu/sparc/jniFastGetFieldsparc.cpp Addition of the placeholder 'jnienv'; why rename 'robj' -> 'obj'? I changed the name because no other register in these files have the r prefix for the register variables (as they do in the jniFastGetField files), and I simply could not keep my fingers from fixing that inconsistency.
Sounds good. Thanks for explaining.
On SPARC, unlike x8664, the temporary register (G3) gets immediately clobbered after calling the tryresolvejobjectinnative function.
How/where? I just don't see it, but my SPARC assembly is rusty...
Therefore, I did not bother clobbering it afterwards as it will already be clobbered. On x8664 on the other hand, the temporary register (r8) is not clobbered by the rest of the code, so I decided to explicitly clobber it so that any mistakes here become more easily reproducible.
Yup. I liked that change.
L74: // Both O5 and G3 are clobbered by tryresolvejobjectinnative. You are passing 'G3' as 'tmp'. I don't see where/how in tryresolvejobjectinnative() that the 'tmp' parameter gets clobbered.
It is used by ZGC only to materialize a mask for catching stale pointers.
So G3 only gets clobbered with ZGC (which is not yet in this code base)?
src/hotspot/cpu/x86/gc/shared/barrierSetAssemblerx86.cpp src/hotspot/cpu/x86/gc/shared/barrierSetAssemblerx86.hpp Addition of the placeholder 'jnienv'; why rename 'robj' -> 'obj'?
src/hotspot/cpu/x86/jniFastGetFieldx8664.cpp L50: static const Register rtmp = r8; Nit: These seem to be sorted by register number and not by variable name. Fixed the sorting. Addition of the placeholder 'jnienv'; why rename 'robj' -> 'obj'? As mentioned, the robj -> obj change was for consistency with the other code in these barrier set files.
Thumbs up on the technical part of this change. My comments are just about style and about the comments. You should make sure that you get a review from either Kim or David H. since they were on the original review team for JDK-8200235. Thanks for the Review Dan!
You're welcome. It would be good to get this fix in today if at all possible... Once your fix is in, we can close my BACKOUT bug as "will not fix"...
Dan
Thanks, /Erik
Dan
Bug ID: https://bugs.openjdk.java.net/browse/JDK-8202381 My (long version) analysis on the architectures (that I made changes to) of what is going on: == SPARC == This is where the SIGBUS happened. We used to get the jobject/jweak in O1, clear the jweak tag in O1, and resolve the handle by loading from O1 to O5. This was in fact already broken, since 8176100 "[REDO][REDO] G1 Needs pre barrier on dereference of weak JNI handles". The registers used to pass the arguments into the C function (O0, O1 and O2) must not be touched throughout the call, despite being caller saved, as we have not saved the register window. By clearing the jweak in O1, the wrong access functions are used in the slow path if the speculative load is invalidated, either due to the second safepoint counter not being able to the first sampled safepoint counter, or the load of the int popping in to the signal handler. I made this problem worse in my 8200235 patch by not just clearing O1, but also resolving into O1, which clobbered the register even further. This caused intermittent failures in the following sequence of events: 1) the safepoint counter is read, and we are not in a safepoint, 2) the jobject/jweak is resolved, clobbering O1, 3) a safepoint is triggered, 4) after resolving the jobject/jweak, the safepoint counter is sampled again with a new number, causing a call to the slow path where O1 no longer contains the jobject/jweak. In my proposed changes, this was remedied by moving O1 to O5 first, then clearing the jweak value (in O5), then resolving (in O5) and then performing the speculative load with all passed C arguments untouched in their O1, O2 and O3 registers, as required for the fallback code. == x8664 == On x8664, the passed in jobject/jweak is already moved to a separate scratch register. Therefore, the problem of clearing the jweak tag in the jobject does not apply here. However, rscratch1 was the register I picked as a temporary register for resolving. After reading this code more carefully, I realize this is the r10 register, which coincides with the register used to remember the old counter. So using that temporary register is not safe. I chose a new register instead: r8. This is, like the other registers used in this code, a caller saved register, that does not intersect with the input registers of the function arguments. That makes it safe to use in this context. I added DEBUGONLY code for clobbering this temporary register on x8664, to make sure we more easily catch such problems. == AArch64 == On AArch64, the jobject/jweak is already in r3 when cleared (which does not intersect with the registers used in the C calling convention, and is caller saved). Therefore, the clearing of the jweak tag there is correct, and so is the resolution. The temporary register used is r8, which is used as a temporary register by surrounding code in this context (and is caller saved). Apart from this, I also added a jnienv parameter to tryresolvejobjectinnative, which just passes crarg0 containing the jni environment, without doing anything to it. We need to read from this register in ZGC, and I would prefer not to hardcode in the backend that crarg0 always refers to the jni environment in this context. There are still remaining problems in ARM that the jweak tag is cleared in r1, which will cause the slow path to resolve jweaks as jobjects. However, I would like to not address that issue in this fix, as my original changes did not touch that ARM code. I would like to instead file a bug for that to be solved by another brave soul who wants to change the jniFastGetField code in the ARM port. Testing: I ran the httpclient tests (that failed because of this) with --test-repeat 100, and ran that 3 times. No problems discovered. I am also finishing up hs-tier1-3 and jdk-tier1-3. Thanks, /Erik
- Previous message: RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224
- Next message: RFR: 8202381: (Solaris) SIGBUS in # V [libjvm.so+0xcee494] jni_GetIntField+0x224
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]