[9] RFR(S): 8075805: Crash while trying to release CompiledICHolder (original) (raw)

Doerr, Martin martin.doerr at sap.com
Tue Aug 25 14:53:27 UTC 2015

Previous message: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder
Next message: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Tobias,

thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier. We had observed very sporadic assertions or freeing of unallocated memory.

Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test.

Additionally, it may be required to adapt a couple of assertions. We modified the following ASSERT code (based on hotspot 25)

in CompiledIC::is_call_to_compiled(): Accessing cached_metadata() may be unsafe if !caller->is_alive().
in CompiledIC::is_call_to_interpreted(): CodeBlob* db may be gone if cb is unloaded.
in CompiledIC::verify(): is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks.
in nmethod::verify_clean_inline_caches(): In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method.

For details, please see below.

Best regards, Martin

assert( is_c1_method || !is_monomorphic || is_optimized() || !caller->is_alive() || (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check");

#ifdef ASSERT { CodeBlob* db = CodeCache::find_blob_unsafe(dest); if (!db) { nmethod *nm = cb->as_nmethod_or_null(); assert(nm, "sanity"); if ( nm->is_in_use() || (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) { { // Dump some information. ttyLocker ttyl; tty->print_cr("ERROR: Did not find codeblob for destination %p", dest); nm->print(tty); Method m = nm->method(); if (m) { m->print_on(tty); } } assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state()))); } } else assert(!db->is_adapter_blob(), "must use stub!"); } #endif / ASSERT */

assert(is_clean() || is_optimized() || is_megamorphic() || is_call_to_compiled() || is_call_to_interpreted() , "sanity check");

case relocInfo::virtual_call_type: {
  CompiledIC *ic = CompiledIC_at(&iter);
  // Ok, to lookup references to zombies here
  CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination());
  if( cb != NULL && cb->is_nmethod() ) {
    nmethod* nm = (nmethod*)cb;
    // Verify that inline caches pointing to not_entrant methods are clean
    if (nm->is_not_entrant()) {
      assert(ic->is_clean(), "IC should be clean");
    }
  }
  break;
}
case relocInfo::opt_virtual_call_type: {

-----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Dienstag, 25. August 2015 14:52 To: Doerr, Martin; Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder

Hi Martin,

thanks for looking at this!

It's interesting that you encountered and fixed the same problem. Were you able to create a regression test?

I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time.

I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev: http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/

I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions.

If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR.

Thanks, Tobias

On 25.08.2015 12:34, Doerr, Martin wrote:

Hi all,

we appreciate that this code gets cleaned up. Iterating over all nmethods in gcepilogue should fix the problem. Did anybody check the impact on the safepoint duration? We have also fixed this problem. We use the other approach and added following code to nmethod::makenotentrantorzombie: if (state == zombie) { MutexLockerEx ml(SafepointSynchronize::isatsafepoint() ? NULL : CompiledIClock); address lowboundary = verifiedentrypoint () + NativeJump::instructionsize; // See cleanupinlinecaches. RelocIterator iter(this, lowboundary); while (iter.next()) { if (iter.type() == relocInfo::virtualcalltype) { CompiledIC *ic = CompiledICat(&iter); ic->seticdestinationandvalue(SharedRuntime::getresolvevirtualcallstub(), (Metadata*)NULL); } } } (Note: seticdestinationandvalue is currently private.) As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration. Not sure which approach is the better one. Best regards, Martin

-----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann Sent: Dienstag, 25. August 2015 07:43 To: Igor Veresov Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder On 24.08.2015 22:10, Igor Veresov wrote: Seems good to me. Thanks, Igor. Btw, did you find why there is a need for “marked for reclamation” state? No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition. Best, Tobias

igor On Aug 24, 2015, at 12:58 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:

Thanks, Vladimir! Please see comments inline. On 21.08.2015 19:28, Vladimir Kozlov wrote: During our discussion about this problem we thought that we may need additional call nm->cleanupinlinecaches() by sweeper when we convert notentrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it? I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between. The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods.

Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." I don't see how changing makemarkednmethodszombies() call to makemarkednmethodsnotentrant() affects unloaded nmethods. Both make*() methods iterates only over isalive() nmethods (iter.nextalive()), so they skip unloaded. Yes, my description is wrong. It should be "The only impact is that marked nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods. In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie. What spacing you changed in compiledIC.cpp because webrev does not show them? I fixed the wrong indentation in CompiledIC::settoclean(). You can see it in the webrev if you click on "Patch": http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch Please, fix comment in vmoperations.cpp

// Make the dependent methods zombies - CodeCache::makemarkednmethodszombies(); + CodeCache::makemarkednmethodsnotentrant(); Fixed. New webrev: http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/ Thanks, Tobias

Thanks, Vladimir On 8/21/15 7:36 AM, Tobias Hartmann wrote: Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8075805 http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/ Problem: The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::isicholderentry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. state of A state of B ----------------------------------------- non-entrant non-entrant S [not on stack] [not on stack] S zombie zombie S marked marked S flushed flushed/re-allocated The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. Let's look at the following setting: state of A state of B ----------------------------------------- non-entrant S [not on stack] non-entrant S zombie [not on stack] zombie S marked marked S flushed flushed/re-allocated There are two problems here: - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, - the transition from B to zombie happens outside the sweeper in 'CodeCache::makemarkednmethodszombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: state of A state of B ----------------------------------------- unloaded unloaded S zombie zombie S marked marked S flushed flushed/re-allocated Again, we crash while flushing A. Solution: I removed the 'makemarkednmethodszombies()' and replaced it by calls to 'makemarkednmethodsnotentrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie. To fix the nmethod unloading case, I changed the implementation of CodeCache::gcepilogue to clean ICs of unloaded nmethods as well. Testing: - Executed failing tests for a week (still running) - JPRT - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences Thanks, Tobias

[1] Detailed logs for nmethod A (1178) and nmethod B (552): Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cachedvalue 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18 IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap': ### IC at 0xffff80ffad89b017: set to Nmethod 552/ ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from makemarkednmethodszombies() ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb ### I2C/C2I adapter 0xffff80ffad66ac10 allocated ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed cleanupcallsite 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017 enqueueing icholder 0x0000000800034a18 to be freed *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb deleting icholder 0x0000000800034a18 ## nofmallocs = 211209, noffrees = 105760 ## memory stomp: GuardedMemory(0xffff80ff623ca180) baseaddr=0x00000008000349f8 tag=0x0000000800034a18 usersize=18446604433140746280 userdata=0x0000000800034a18 Header guard @0x00000008000349f8 is BROKEN

Previous message: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder
Next message: RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the hotspot-compiler-dev mailing list