RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing (original) (raw)

roger riggs roger.riggs at oracle.com
Mon Jun 9 15:03:13 UTC 2014


Hi Eric, Martin,

I'm fine with the re-write. I'm not sure why the re-ordering of y3 will change the behavior of the test but it will provide more debugging info.

Roger

On 6/6/2014 9:32 PM, Martin Buchholz wrote:

If you don't want to go with my rewrite, you can conservatively just check in a 10x increase in all the constant durations and see whether the flakiness goes away.

On Thu, Jun 5, 2014 at 9:46 PM, Martin Buchholz <martinrb at google.com_ _<mailto:martinrb at google.com>> wrote: As with David, the cause of the failure is mystifying. How can things fail when we stay below the timeout value of 500ms? There's a bug either in Timer or my own understanding of what should be happening. Anyways, raising the timeout value (as I have done in my minor rewrite) seems prudent. Fortunately, we can write this test in a way that doesn't require actually waiting for the timeout to elapse. On Wed, Jun 4, 2014 at 1:23 PM, roger riggs <roger.riggs at oracle.com <mailto:roger.riggs at oracle.com>> wrote: Hi Martin, Eric, Of several hundred failures of this test, most were done in a JRE run with -Xcomp set. A few failures occurred with -Xmixed, none with -Xint. The printed "elapsed" times (not normalized to hardware or OS) range from 24 to 132 (ms) with most falling into several buckets in the 30's, 40's, 50's and 70's. I don't spot anything in the Timer.mainLoop code that might break when highly optimized but that's one possibility. Roger

On 6/4/2014 3:25 PM, Martin Buchholz wrote: Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms. - system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it) Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch? I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong. On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote: Hi Martin, Thanks for explanation, now I can understand why you set the DELAYMS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAYMS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost. Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical). To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected. The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836] Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang at oracle.com_ _<mailto:yiming.wang at oracle.com>> wrote: Hi Martin, To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAYMS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAYMS)". Try sleep(DELAYMS), the failure can be reproduced. Well sure, then the task is rescheduled, so I expect it to fail in this case. But in my version I had set DELAYMS to 100 seconds. The point of extending the DELAYMS is to prevent flaky failure on a slow machine. Again, how do we know that this test hasn't found a Timer bug? I still can't reproduce it.



More information about the core-libs-dev mailing list