RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code (original) (raw)
Carsten Varming varming at gmail.com
Tue Oct 20 15:52:01 UTC 2015
- Previous message: RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code
- Next message: RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear Dan,
See inline.
On Tue, Oct 20, 2015 at 7:29 AM, Daniel D. Daugherty < daniel.daugherty at oracle.com> wrote:
Carsten,
It's been a long time! Thanks for the quick review...
It was a nice small change. Sorry, I am asking for a larger change.
Replies embedded below...
On 10/19/15, 10:09 PM, Carsten Varming wrote: Dear Dan, I thought acquire/release semantics mandated that the release should be paired with an acquire.
Yes, src/share/vm/runtime/orderAccess.hpp is pretty clear about that. I modeled this fix after some other ObjectMonitor code that doesn't (yet) follow that mandate. In this case, the first line in verifyobjmonisinpool should be changed from "PaddedEnd * block = (PaddedEnd *)gBlockList" to use a loadacquire. Actually, it seems like there are a lot of reads of gBlockList that should be replaced with a loadacquire in synchronizer.cpp. Yes, that would be a good change and cleanup. The really annoying part is that the current fix seems to have resolved the bug... Current numbers: 700K iterations of 4 fastdebug instances in parallel in 4.5 days. I typically saw one failure per day in a three day run...
I suspect the data dependency between "block = gBlockList" in the top of verify_objmon_isinpool and "block = block->FreeNext" at the bottom prevents the compiler from re-ordering the loads. Hence the assembly will be correct. Likewise the data dependency prevents the cpu from re-ordering the loads. Hence the program doesn't crash. Nevertheless, I encourage you to use the right acquire/release semantics.
Otherwise the change looks great. Thanks!
You are welcome.
Carsten
Dan
Carsten (not a reviewer). On Mon, Oct 19, 2015 at 8:02 PM, Daniel D. Daugherty <_ _daniel.daugherty at oracle.com> wrote: Greetings, I have a fix for a long standing race between the lock-free ObjectMonitor verification code and the normal (locked) ObjectMonitor block allocation code path. For this fix, I would like at least a Runtime team reviewer and a Serviceability team reviewer. Thanks! JDK-8047212 runtime/ParallelClassLoading/bootstrap/random/inner-complex assert(ObjectSynchronizer::verifyobjmonisinpool(inf)) failed: monitor is invalid https://bugs.openjdk.java.net/browse/JDK-8047212 Webrev URL: http://cr.openjdk.java.net/~dcubed/8047212-webrev/0-jdk9-hs-rt/ Testing: Aurora Adhoc RT-SVC nightly batch 4 inner-complex fastdebug parallel runs for 4+ days and 600K iterations without seeing this failure; the experiment is still running; final results to be reported at the end of the review cycle JPRT -testset hotspot This fix: - makes ObjectMonitor::gBlockList volatile - uses "OrderAccess::releasestoreptr(&gBlockList, temp)" to make sure the new block updates happen before gBlockList is changed to refer to the new block - add SA support for a "static pointer volatile" field like: static ObjectMonitor * volatile gBlockList; See the following link for a nice description of what "volatile" means in the different positions on a variable/parameter decl line:
http://www.embedded.com/electronics-blogs/beginner-s-corner/4023801/Introduction-to-the-Volatile-Keyword Thanks, in advance, for any comments, questions or suggestions. Dan
- Previous message: RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code
- Next message: RFR (XS) 8047212: fix race between ObjectMonitor alloc and verification code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]