RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking (original) (raw)
coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Mon Feb 20 22:29:59 UTC 2017
- Previous message: RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
- Next message: RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2/20/17 10:47 AM, Erik Helin wrote:
Ok, lets see if we can wrap this up :) I just uploaded a new version, 03: - incremental: http://cr.openjdk.java.net/~ehelin/8168914/02-03/ - full: http://cr.openjdk.java.net/~ehelin/8168914/03/
The following changes have been made to the original patch: 00 -> 01: - Updated copyright year - Fixed typo in comment - Updated stale comment - Removed unnecessary loadaquire when loading c->next 01 -> 02: - ChunkedHandleList now derives ValueObj instead of CHeapObj - Both ChunkedHandleList and Chunk are now classes instead of structs - Changed capacity of chunk to 32 from 64 (same capacity as used by JNIHandleBlock) 02 -> 03: - Only use loadacquire when reading the size for the "head" chunk - Fix grammar in comment - Refer to the correct lock in comment AFAICS (scanning through all the emails) there are now only three remaining comments: - Thomas: can we assert in ChunkedHandeList::add that the correct mutex is locked? - Then we would have to pass the mutex along as a parameter to
add
, which to me seems unnecessary. Then ChunkedHandleList::add might just as well lock the mutex instead of asserting. But thanks for highlight this, I noticed I referred to the wrong lock in the comment :) - Coleen: can we remove the word unsafe from ClassLoaderData::removehandleunsafe? - I'd like to keep it until we design a better removehandle function. There is only one user of removehandleunsafe, so changing the name in the future should be easy.
ok.
Coleen
- Kim: can we have something like
convertxtoy
instead of ((oop) op) = NULL? - I would like to defer that to another patch, but I agree that it is rather nasty. Mostly because I don't know what the "right" API should look like, since this is the first time in run into this problem. Thanks, Erik On 02/14/2017 05:40 PM, Erik Helin wrote: Hi all,this patch aims to solve the bug [0] by making it safe for a GC to concurrently traverse the oops in a ClassLoaderData. The problem right now is that ClassLoaderData::handles is a JNIHandleBlock. It is not safe for one thread to iterate over the oops in a JNIHandleBlock while another thread concurrently adds a new oop to the JNIHandleBlock. My proposed solution is to replace JNIHandleBlock with another data structure for ClassLoaderData. ClassLoaderData only needs a place to store oops that a GC can iterate over, it has no use for the rest of the methods in the JNIHandleBlock class. I have implemented a minimal chunked linked list data structure (internal to ClassLoaderData) with only two public methods: - add (only executed by one thread at a time) - oopsdo (can be executed by any number of threads, possibly concurrently with a call to
add
) ChunkedHandleList uses loadacquire and releasestore to synchronize access to the underlying chunks (arrays). Since ChunkedHandleList utilizes the (rather) minimal requirements of its user (ClassLoaderData), I decided to keep the class internal to ClassLoaderData for now. If other find it useful elsewhere, the we can try to create a more generalized version (this is trickier than it appears at first sight, I tried ;)). I also changed ClassLoaderData::removehandle to write NULL to the oop* (that is represented by a jobject), instead of using a sentinel oop as done by JNIHandleBlock. The GC closures should be prepared to handle a field in a Java object becoming NULL, so this should be fine. Bug: https://bugs.openjdk.java.net/browse/JDK-8168914 Webrev: http://cr.openjdk.java.net/~ehelin/8168914/00/ Testing: - Tier 1 (aka JPRT), 2, 3, 4, 5. I would appreciate quite a few reviews on this patch, it contains a nice mixture of: - me venturing into the runtime code :) - lock free code - unreproducible bug (even though I'm sure of the root cause) Thanks for everyone participating in the discussions around this bug and potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and Jiangli. Thanks! Erik [0]: https://bugs.openjdk.java.net/browse/JDK-8168914
- Previous message: RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
- Next message: RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]