Loading... (original) (raw)
When running runTheseC (compileThese) we've run into native OOME, primarily on 32 bit windows builds running on large windows machines.
I left runTheseC running overnight on a Solaris machine in the hope of using libumem's memory leak detection but I couldn't get any useful information from it, but we're definitely leaking something:
$ pmap 5287 |grep heap
0000000000411000 4193312K rw--- [ heap ]
0000000100319000 2350824K rw--- [ heap ]
Metaspace usage is around 11MB with 40MB committed so we don't have a lot of live classes it seems.
Using libumem to gather some snapshots of all malloc() calls in a run. One thing that shows up is allocation of ParkEvents which are leaked (intentionally, it appears).
runThese aggresively spawns threads which open JAR files, which seem to end up in JVM_RawMonitorEnter:
libumem.so.1`malloc+0x2e
libjvm.so`__1cCosGmalloc6FLHpC_pv_+0x80
libjvm.so`__1cJParkEventIAllocate6FpnGThread__p0_+0x116
libjvm.so`__1cHMonitorMjvm_raw_lock6M_v_+0x248
libjvm.so`JVM_RawMonitorEnter+0x25
libzip.so`ZIP_Lock+0xd
libzip.so`Java_java_util_zip_ZipFile_read+0x43
0xfffffd7fed812094
ParkEvents on Solaris are 440 bytes each, and there are >10000 of them on the ParkEvent::FreeList after an hour of running the compileThese version of runThese.
I also tried an instrumented build on Windows, where I use HeapCreate to create a separate memory heap for allocating ParkEvents to be able to track them externally to the process. After running runTheseC for around 30 minutes that heap has grown to 256MB.
A theory for the root cause of this is that ParkEvent::Allocate is not designed to handle the load of 15-16 threads contending on a Monitor* through the JVM_RawMonitor* API.
Using the RawMonitor functions disallows the VM from using the JavaThread's ParkEvent and forces all those contending threads to hit ParkEvent::Allocate.
ParkEvents are maintained on a lock-free free list which is designed to avoid ABA problems by doing push-one pop-all, so there is a potential for allocation spikes while one thread is CAS:ing on the FreeList.
I=H (aggressive memory leak if this problem occurs, can easily lead to crash due to OOME)
L=L (very unlikely situation)
W=H (no known work-around if this situation arises)