RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks (original) (raw)
Robbin Ehn robbin.ehn at oracle.com
Wed Oct 31 07:03:30 UTC 2018
- Previous message: RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
- Next message: RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks Dan, Robbin
On 10/30/18 2:18 PM, Daniel D. Daugherty wrote:
On 10/29/18 9:31 PM, David Holmes wrote:
Thanks for the explanation Robbin.
The inline patch also seems fine. I hope the other reviewers noticed it. Yes, but I forgot to reply to it. Thumbs up. Dan
David On 29/10/2018 7:05 PM, Robbin Ehn wrote: Hi David, On 29/10/2018 07:20, David Holmes wrote: Hi Robbin,
On 29/10/2018 6:08 AM, Robbin Ehn wrote: Hi Dan,
Thanks for looking at this, here is the update: Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/ Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/ I can't say I really understand the change in protocol here and why all the cancel operations are no longer needed. I see the handshake VM operations reusing the initial "threads list" but I'm unclear how they might be affected if additional threads are added to the system before the Threadslock is acquired? The ThreadsList is a snapshot of all the JavaThreads at that time in the VM. Handshake all threads only handshake those JavaThreads. We do not care about new threads. The typical generic use-case is the similar to RCU. You first update a global state and emit the handshake when the handshake return no thread can see the old state. GlobalFuncPtr = somenewfunc; HandshakeAllThreads; ------------------------------ No thread can be executing the old func. If the JavaThreads have a local copy of GlobalFuncPtr the handshake operation would be to update the local copy to somenewfunc. It works for both Java and for VM resources that respect safepoints. For a pure VM resource it's much cheaper to use the GlobalCounter. The Threadslock must only be held for S/R protocol. With changes to the S/R protocol, such as using handshake instead, we can remove Threadslock for handshakes completely. (with a other small fixes) The cancel is no longer needed since the terminated threads are visible to the VM thread when we keep the arming threadslist. We add terminated threads as safe for handshake. But if we handshake a terminated thread we do not execute the handshake operation, instead just clear the operation and increment the completed counter. (the VM thread cancels the operation) I hope that helped?
A couple of specific comments: src/hotspot/share/runtime/handshake.hpp cancelinner() is dead now. --- src/hotspot/share/runtime/handshake.cpp This was an odd looking for loop before your change and now looks even more strange: for ( ; JavaThread *thr = jtiwh.next(); ) { can it not simply be a more normal looking: for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) { ? Thanks, fixed with below patch. /Robbin diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.cpp --- a/src/hotspot/share/runtime/handshake.cpp Sun Oct 28 20:57:24 2018 +0100 +++ b/src/hotspot/share/runtime/handshake.cpp Mon Oct 29 09:32:26 2018 +0100 @@ -166,1 +166,1 @@ - for ( ; JavaThread *thr = jtiwh.next(); ) { + for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) { @@ -198,1 +198,1 @@ - for ( ; JavaThread *thr = jtiwh.next(); ) { + for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) { diff -r 5f8b292c473f src/hotspot/share/runtime/handshake.hpp --- a/src/hotspot/share/runtime/handshake.hpp Sun Oct 28 20:57:24 2018 +0100 +++ b/src/hotspot/share/runtime/handshake.hpp Mon Oct 29 09:32:26 2018 +0100 @@ -63,1 +62,0 @@ - void cancelinner(JavaThread* thread);
--- Thanks, David /Robbin On 26/10/2018 17:38, Daniel D. Daugherty wrote: On 10/26/18 10:33 AM, Robbin Ehn wrote: Hi, please review. When the VM thread executes a handshake it uses different ThreadsLists during the execution. A JavaThread that is armed for the handshake when it is already in the exit path in VM will cancel the handshake. Even if the VM thread cannot see this thread after the initial ThreadsList which where used for arming, the handshake can progress when the exiting thread cancels the handshake. But if a third thread takes a ThreadsList where the exiting JavaThread is present and tries to execute a VM operation, hence waiting on VM thread to finish the handshake, the JavaThread in the exit path can never reach the handshake cancellation point. VM thread cannot finishes the handshake and the third thread is stuck waiting on the VM thread. To allow holding a ThreadsList when executing a VM operation we instead let the VM thread use the same ThreadsList over the entire handshake making all armed threads visible to the VM thread at all time. And if VM thread spots a terminated thread it will count that thread is already done by only clearing it's operation. Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able. Added a jtreg handshake + thread suspend test as a reproducer. Issue: https://bugs.openjdk.java.net/browse/JDK-8212933 Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/ src/hotspot/share/runtime/handshake.hpp No comments. src/hotspot/share/runtime/handshake.cpp L358: void HandshakeState::processbyvmthread(JavaThread* target) { L359: assert(Thread::current()->isVMthread(), "should call from vm thread"); Both calls to handshakeprocessbyvmthread() which calls this function are made with the Threadslock held: MutexLockerEx ml(Threadslock, Mutex::nosafepointcheckflag); Looks like the lock is grabbed because of possiblyvmthreadcanprocesshandshake() which asserts: L351: // An externally suspended thread cannot be resumed while the L352: // Threadslock is held so it is safe. L353: // Note that this method is allowed to produce false positives. L354: assert(Threadslock->ownedbyself(), "Not holding Threadslock."); L355: if (target->isextsuspended()) { L356: return true; L357: } Also looks like vmthreadcanprocesshandshake() needs the Threadslock for the same externally suspended thread check. So I was going to ask that you add: assert(Threadslock->ownedbyself(), "Not holding Threadslock."); after L359, but how about a comment instead: // Threadslock must be held here, but that is assert()ed in // possiblyvmthreadcanprocesshandshake().
src/hotspot/share/runtime/thread.hpp No comments. src/hotspot/share/runtime/thread.cpp No comments. src/hotspot/share/runtime/threadSMR.cpp No comments. test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java Very nice test! It specifically exercises ThreadLocalHandshakes with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java only ran into this bug by accident (JDK-8212152) so I like the targeted test. L49: while(!exitnow) { nit - please add a space before '(' L51: for (int i = 0; i < threads.length; i+=2) { L58: for (int i = 0; i < threads.length; i+=2) { nit - please added spaces around '+=' So why every other thread? A comment would be good... L52: wb.handshakeWalkStack(null, true); I'm guessing the 'null' parameter means current thread, but that's a guess on my part. A comment would be good. L82: for (int i = 0; i < threads.length; i++) { L83: threads[i].join(); L84: } Thanks for cleaning up the testthreads. That will make the JTREG thread sweeper happy. However, you don't save the testexitthread references and you don't clean those up either. Yes, I realize that they are supposed to exit, but if something hangs up on exit, I'd rather have a join() hang failure in this test's code than have the JTREG thread sweeper catch it. Dan
Thanks, Robbin
- Previous message: RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
- Next message: RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]