Review request: 7072527 CMS: JMM GC counters overcount in some cases (original) (raw)

Kevin Walls kevin.walls at oracle.com
Wed Aug 3 04:56:38 PDT 2011


Thanks Ramki, and thanks for chasing up and clarifying the OCA point.

Kevin

On 01/08/11 19:35, Ramki Ramakrishna wrote:

Looks good to me; ship it!

-- ramki On 7/29/2011 4:13 PM, Kevin Walls wrote:

Actually I missed the other comment that we could go further and remove the unneeded constructor. http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/ Thanks Kevin On 29/07/11 17:35, Kevin Walls wrote: Hi --

I got this ready in a webrev: http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/ The new test is passing, as are the old ones. 8-) Also, this change means that a CMS cycle ending in concurrent mode failure now counts as one collection. One could argue that either way (should it be 2 collections?) - but I'm thinking if we are counting "completed" collections then we are now counting correctly! Thanks Kevin

On 29/07/11 02:08, Y. S. Ramakrishna wrote: I filed: 7072527 CMS: JMM GC counters overcount in some cases On 07/28/11 17:29, Y. S. Ramakrishna wrote: Hi Kevin --

thanks for jumping on this! More inline below ... On 07/28/11 09:33, Krystal Mok wrote: Hi Kevin,

Thank you for taking care of this, and it's good to see the problem is verified. I think whether or not the suggested fix is sufficient depends on what paths can reach CMSCollector::docompactionwork(). If all paths that can reach CMSCollector::docompactionwork() come from GenCollectedHeap::docollection(), then the fix should be good to go. Otherwise it'll need a better workaround. I believe all concurrent mode failures/interrupts (which includes the System.gc() case) does come from GenCollectedHeap::docollection(), but I'm not exactly sure about this, could anybody please clarify on it? Yes, i believe this is indeed the case, and my browsing of the code using cscope seemed to confirm that belief. More below ...

Regards, Kris Mok On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:


Hi -- 6580448 was marked as a duplicate of 6581734, which fixed the fact that CMS collections were just not counted at all - with CMS, only a stop the world full gc would be counted in the stats. But looks like you're right... Here is a quick variation of the testcase from 6581734 which shows the same thing, and this verifies the same, and is solved by ExplicitGCInvokesConcurrent. If there is no other feedback I can test if the removal of the TraceCMSMemoryManagerStats() call in CMSCollector::docompactionwork is all we need...

Kevin, yes, it would be great if you could verify this and push the fix. I am not sure if the push would need to wait for the signing of OCA from Kris, but best to check with Those Who Would Know Such Things. Since the original CR has been closed, i'll open one momentarily and can make you RE (if that's OK with you). I'll be happy to serve as reviewer of the change. As regards the jstat counter reporting two pauses per concurrent CMS cycle, I am of two minds on what the original intention was. I'd have originally regarded the double increment as a bug, but as you state it is really two pauses, even if part of a single cycle. And it makes sense to count them as two. I agree that this should be documented and left alone, given how long we have had this behaviour, and the alternative (of counting cycles, rather than pauses) may be no better (or arguably worse). There's actually an open CR for this which we can redirect into a CR to update the relevant documentation. -- ramki

Regards Kevin

/* * Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com <http://www.oracle.com> if you need additional information or have any * questions. */ /* * @test TestFullGCount.java * @bug * @summary * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount * */ import java.util.*; import java.lang.management.*;

public class TestFullGCCount { private String poolName = "CMS"; private String collectorName = "ConcurrentMarkSweep"; public static void main(String [] args) { TestFullGCCount t = null; if (args.length==2) { t = new TestFullGCCount(args[0], args[1]); } else { System.out.println("Defaulting to monitor CMS pool and collector."); t = new TestFullGCCount(); } t.run(); } public TestFullGCCount(String pool, String collector) { poolName = pool; collectorName = collector; } public TestFullGCCount() { } public void run() { int count = 0; int iterations = 20; long counts[] = new long[iterations]; boolean diffAlways2 = true; // assume we will fail for (int i=0; i<iterations; i++) {_ _System.gc();_ _counts[i] = checkStats();_ _if (i>0) { if (counts[i] - counts[i-1] != 2) { diffAlways2 = false; } } } if (diffAlways2) { throw new RuntimeException("FAILED: difference in count is always 2."); } System.out.println("Passed."); } private long checkStats() { long count = 0; List pools = ManagementFactory.getMemoryPoolMXBeans(); List collectors = ManagementFactory.getGarbageCollectorMXBeans(); for (int i=0; i<collectors.size(); i++) {_ _GarbageCollectorMXBean collector = collectors.get(i);_ _String name = collector.getName();_ _if (name.contains(collectorName)) {_ _System.out.println(name + ": collection count = "_ _+_ _collector.getCollectionCount());_ _count = collector.getCollectionCount();_ _}_ _}_ _return count;_ _}_ _}_ _On 27/07/11 17:12, Krystal Mok wrote:_ _Hi all,_ _I've been looking at a strange inconsistency of full GC count_ _recorded by jvmstat and JMM counters. I'd like to know which_ _ones_ _of the following behaviors are by design, which ones are_ _bugs, and_ _which ones are just my misunderstanding. I apologize for_ _making a_ _short story long..._ _=====================================================_ _The counters involved:_ _* A jvmstat counter named "sun.gc.collector.1.invocations"_ _keeps_ _track of the number of pauses occured as a result of a major_ _collection. It is used by utilities such as jstat as the_ _source of_ _"FGC" (full collection count), and the old gen collection_ _count in_ _Visual GC. It's updated by an TraceCollectorStats object._ _* A JMM counter, GCMemoryManager::numcollections, keeps_ _track of_ _the number of collections that have ended. This counter is_ _used as_ _HotSpot's implementation of the JMX_ _GarbageCollectorMXBean.getCollectionCount(). It's updated by_ _either a TraceMemoryManagerStats object or a_ _TraceCMSMemoryManagerStats object._ _To show the situation, I've made a screenshot of a VisualVM_ _and a_ _JConsole running side by side, both are monitoring the_ _VisualVM's_ _GC stats:_ _http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png_ _(I'll upload the screenshot to somewhere else if anybody_ _can't see it)_ _The VisualVM instance is running on JDK6u26, with ParNew+CMS._ _In the screenshot, Visual GC reports that the old gen_ _collection_ _count is 20, while JConsole reports 10._ _I see that there was this bug:_ _6580448: CMS: Full GC collection count mismatch between_ _GarbageCollectorMXBean and jvmstat (VisualGC)_ _I don't think the current implementation has a bug in the sense_ _that the two counters don't report the same number._ _This behavior seems reasonable, but the naming of the value in_ _these tools are confusing: both tools say "collections", but_ _apparently the number in Visual GC means "number of pauses"_ _where_ _as the number in JConsole means "number of collection cycles"._ _It'd be great if the difference could be documented_ _somewhere, if_ _that's the intended behavior._ _And then the buggy behavior. Code demo posted on gist:_ _https://gist.github.com/1106263_ _Starting from JDK6u23, when using CMS without_ _ExplicitGCInvokesConcurrent, System.gc() (or_ _Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would_ _make the JMM GC counter increment by 2 per invocation, while_ _the_ _jvmstat counter is only incremented by 1. I believe the_ _latter is_ _correct and the former needs some fixing._ _=====================================================_ _My understanding of the behavior shown above:_ _1. The concurrent GC part:_ _There are 2 pauses in a CMS concurrent GC cycle, one in the_ _initial mark phase, and one in the final remark phase._ _To trigger a concurrent GC cycle, the CMS thread wakes up_ _periodically to see if it shouldConcurrentCollect(), and_ _trigger a_ _cycle when the predicate returned true, or goes back to_ _sleep if_ _the predicate returned false. The whole concurrent GC cycle_ _doesn't go through GenCollectedHeap::docollection()._ _The jvmstat counter for old gen pauses is updated in_ _CMSCollector::doCMSoperation(CMSoptype op), which exactly_ _covers both pause phases._ _The JMM counter, however, is updated in the concurrent sweep_ _phase, CMSCollector::sweep(bool asynch), if there was no_ _concurrent mode failure; or it is updated in_ _CMSCollector::docompactionwork(bool clearallsoftrefs)_ _in case_ _of a bailout due to concurrent mode failure (advertised as_ _so in_ _the code comments). So that's an increment by 1 per_ _concurrent GC_ _cycle, which does reflect the "number of collection cycles",_ _fair_ _enough._ _So far so good._ _2. The System.gc() part:_ _Without ExplicitGCInvokesConcurrent set, System.gc() does a_ _stop-the-world full GC, which consists of only one pause, so_ _"number of pauses" would equal "number of collections" in_ _this case._ _It will go into GenCollectedHeap::docollection(); both the_ _jvmstat and the JMM GC counter gets incremented by 1 here,_ _TraceCollectorStats tcs(gens[i]->counters()); TraceMemoryManagerStats tmms(gens[i]->kind()); But, drilling down into: gens[i]->collect(full, doclearallsoftrefs, size, istlab); That'll eventually go into: CMSCollector::acquirecontrolandcollect(bool full, bool clearallsoftrefs) System.gc() is user requested so that'll go further into mark-sweep-compact: CMSCollector::docompactionwork(bool clearallsoftrefs) And here, it increments the JMM GC counter again (remember it was in the concurrent GC path too, to handle bailouts), even though this is still in the same collection. This leads to the "buggy behavior" mentioned earlier. The JMM GC counter wasn't added to CMS until this fix got in: 6581734: CMS Old Gen's collection usage is zero after GC which is incorrect The code added to CMSCollector::docompactionwork() works fine in the concurrent GC path, but interacts badly with the original logic in GenCollectedHeap::docollection(). ===================================================== I thought all concurrent mode failures/interrupts come from GenCollectedHeap::docollection(). If that's the case, then it seems unnecessary to update the JMM GC counter in CMSCollector::docompactionwork(), simply removing it should fix the problem. With that, I'd purpose the following (XS) change: (diff against HS20) diff -r f0f676c5a2c6 src/share/vm/gcimplementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp --- a/src/share/vm/gcimplementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp Tue Mar 15 19:30:16 2011 -0700 +++ b/src/share/vm/gcimplementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp Thu Jul 28 00:02:41 2011 +0800 @@ -2022,9 +2022,6 @@ intrasweepestimate.paddedaverage()); } - { - TraceCMSMemoryManagerStats(); - } GenMarkSweep::invokeatsafepoint(cmsGen->level(), refprocessor(), clearallsoftrefs); #ifdef ASSERT The same goes for the changes in: 7036199: Adding a notification to the implementation of GarbageCollectorMXBeans ===================================================== P.S. Is there an "official" name for the counters that I referred to as "jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA?



More information about the serviceability-dev mailing list