RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread (original) (raw)

Peter Levart peter.levart at gmail.com
Thu May 9 12:13:42 UTC 2013


On 05/09/2013 01:33 PM, David Holmes wrote:

On 9/05/2013 9:16 PM, Peter Levart wrote:

Hi David,

On 05/09/2013 12:02 PM, David Holmes wrote: Hi Peter,

Thanks for clarifying the test details. A few follow ups: - The reference class does get initialized early in the VM startup well before Main will be loaded. But the use of the weakref doesn't hurt. Ok, so if this is guaranteed, we can remove the weakref construction.

- 500ms may not be enough on some platforms, particularly some embedded systems. Ideally we could code up a handshake using another weakref that would guarantee that the handler thread really survived past the OOM. But at some point this just becomes a no-reg-hard situation :) If we used a weakref then there should be a delay between the referenceHandlerThread.interrupt() and the release of the WeakReference's referent, otherwise the WeakReference could be cleared and enqueued before Reference Handler attempts to throw InterruptedException (this really happens - I tried without delay and the clearing/enqueueing was quicker than interrupt). After this initial delay (currently set to 500ms) releasing the referent and waiting for WeakReference to be enqueued (while executing System.gc()) is analogous to testing for referenceHandlerThread.isAlive() - the only difference being the code that has to be executed in Reference Handler while unwinding the stack until the state of thread changes to TERMINATED. Can this be delayed for a long time? What I was thinking was that after we interrupt we then create a new weakref

Can't do that immediately (no space)!

and we loop until the ref gets cleared, or we detect the reference thread is not alive (with a sleep in between). One of those two conditions must become true.

To create a weakref after interrupt, we have to 1st wait some time (to give Reference Handler thread a chance to throw OOME), then free the heap (release 'waste' and possibly do some System.gc()) to get some space for weakref creation. After we do that, a chance that Reference Handler thread is just in the process of unwinding the stack after uncaught OOME and referenceHandlerThread.isAlive() still returns true is minimal. Do you think we should wait some more to be sure thread is still alive?

- I'm not certain that setting waste=null is sufficient to guarantee the next allocation will succeed under all GC's. GC folk can confim/deny that. We can pre-allocate the test-fail exception before doing fillHeap() so that we can conditionally throw it later, like this: So when we do the throw there is logic in the launcher that will try to print the stacktrace, which may also encounter OOME unless we have freed the heap. So I think we want to release memory, I just wasn't certain that simply setting waste=null would guarantee it.

'waste' is local variable and goes out of scope when main() is finished. So in final variation I haven't even bothered to set it to null at the end. What do you suggest for successful test failure? Setting 'waste' to null, followed by a System.gc() and Thread.sleep()? Can we signal test failure in another way? System.exit(1)?

Regards, Peter

David ------

public class Test { static Object[] fillHeap() { Object[] first = null, last = null; int size = 1 << 20;_ _while (size > 0) { try { Object[] array = new Object[size]; if (first == null) { first = array; } else { last[0] = array; } last = array; } catch (OutOfMemoryError oome) { size = size >>> 1; } } return first; }

public static void main(String[] args) throws Exception { ThreadGroup tg = Thread.currentThread().getThreadGroup(); for ( ThreadGroup tgn = tg; tgn != null; tg = tgn, tgn = tg.getParent() ) ; Thread[] threads = new Thread[tg.activeCount()]; Thread referenceHandlerThread = null; int n = tg.enumerate(threads); for (int i = 0; i < n; i++) { if ("Reference Handler".equals(threads[i].getName())) { referenceHandlerThread = threads[i]; } } if (referenceHandlerThread == null) { throw new IllegalStateException("Couldn't find Reference Handler thread."); } // pre-allocate failure so that we don't cause OOME later Exception failure = new Exception("Reference Handler thread died."); Object waste = fillHeap(); referenceHandlerThread.interrupt(); // allow referenceHandlerThread some time to throw OOME Thread.sleep(500L); if (waste != null && // keep 'waste' reference alive until this point !referenceHandlerThread.isAlive()// check that the handler is still alive ) { throw failure; } } }

Regards, Peter Thanks, David On 9/05/2013 6:35 PM, Peter Levart wrote: On 05/09/2013 07:53 AM, David Holmes wrote: Hi Thomas,

On 9/05/2013 1:28 AM, Thomas Schatzl wrote: Hi,

please review the latest webrev for this patch that is available at http://cr.openjdk.java.net/~tschatzl/7038914/webrev.2/ As mentioned, it incorporates the fix and reproducer from Peter Levart. Fix is fine. I'm not sure about the test (sorry I didn't pay it attention earlier). Have you instrumented the code to verify that the test actually triggers an OOME? It may be possible that if the OOME did kill the reference thread that we wouldn't necessarily detect it using the sleeps etc. Also I don't understand the need for the actual weakRef usage and System.gc() etc. If the heap is full then the interrupt will wake the reference handler thread and the allocation will trigger the OOME. It doesn't matter if there are any references to process. The main logic might then reduce to: referenceHandlerThread.interrupt(); for (int i = 0; i < 10; i++) {_ _if (!referenceHandlerThread.isAlive())_ _throw new Exception("Reference-handler thread died");_ _Thread.sleep(1000);_ _}_ _which just gives it 10s to die else assumes all ok. ?_ _But I can live with existing test (it just might vacuously pass)_ _Cheers,_ _David_ _Hi David, Thomas,_ _Yes, this was just a left-over from my bug reproducer that_ _demonstrated_ _the situation where WeakReference was cleared, but nothing was_ _enqueue-d_ _into the ReferenceQueue. Reference Handler thread otherwise just_ _sleeps_ _in lock.wait() most of the time if there's nothing to do and_ _interrupting it's thread will trigger allocation of_ _InterruptedException_ _and consequently OOME nevertheless._ _One think to watch: Reference Handler thread is started in the_ _j.l.r.Reference static initializer, so the Reference class must be_ _initialized. I don't know if there is any guarantee that this is done_ _before entering main(). So there should be something explicit in_ _main()_ _method that ensures it._ _Also, throwing test-fail exception should be done after releasing the_ _heap or we'll just trigger another OOME without any message to_ _describe_ _the situation._ _Waiting in a loop for ReferenceHandler thread is not needed since the_ _test will normally succeed and so the whole loop will execute to the_ _end. Only when the test fails the loop will exit faster. In my_ _experience, the thread dies between 10-20ms after interrupting, so_ _waiting for about 500ms is enough I think. Also no System.gc() and_ _additional waiting is needed to report the outcome - just releasing_ _the_ _reference to 'waste' variable is enuugh in my experience to restore_ _the_ _heap into working condition._ _Here's the updated test that does all that:_ _public class OOMEInReferenceHandler {_ _static Object[] fillHeap() {_ _Object[] first = null, last = null;_ _int size = 1 << 20;_ _while (size > 0) { try { Object[] array = new Object[size]; if (first == null) { first = array; } else { last[0] = array; } last = array; } catch (OutOfMemoryError oome) { size = size >>> 1; } } return first; } public static void main(String[] args) throws Exception { // ensure ReferenceHandler thread is started new WeakReference(new Object()); ThreadGroup tg = Thread.currentThread().getThreadGroup(); for (ThreadGroup tgn = tg; tgn != null; tg = tgn, tgn = tg.getParent()); Thread[] threads = new Thread[tg.activeCount()]; Thread referenceHandlerThread = null; int n = tg.enumerate(threads); for (int i = 0; i < n; i++) { if("Reference Handler".equals(threads[i].getName())) { referenceHandlerThread = threads[i]; } } if (referenceHandlerThread == null) { throw new IllegalStateException("Couldn't find Reference Handler thread."); } Object waste = fillHeap(); referenceHandlerThread.interrupt(); Thread.sleep(500L); // release waste so we can allocate and throw normal exceptions waste = null; // check that the handler is still alive if (!referenceHandlerThread.isAlive()) { throw new Exception("Reference Handler thread died."); } } }

Regards, Peter Bugs.sun.com: http://bugs.sun.com/viewbug.do?bugid=7038914 JIRA: https://jbs.oracle.com/bugs/browse/JDK-7038914 Testing: JPRT, with the new reproducer passing In need of reviewers and a sponsor for this patch; as mentioned earlier I will set Peter (plevart) as author for this patch, i.e. send a patchset to the sponsor for this change with that author set. Thanks, Thomas



More information about the core-libs-dev mailing list