RFR: JDK-8212028: Use run-test makefile framework for testing in Oracle's Mach5 (original) (raw)

David Holmes david.holmes at oracle.com
Fri Oct 12 04:29:12 UTC 2018


Hi Jon,

On 12/10/2018 11:58 AM, Jonathan Gibbons wrote:

On 10/11/18 3:40 PM, David Holmes wrote: Hi Erik,

On 12/10/2018 8:29 AM, Erik Joelsson wrote: Hello,

(adding serviceability-dev and hotspot-dev for test changes) Bug: https://bugs.openjdk.java.net/browse/JDK-8212028 Webrev: http://cr.openjdk.java.net/~erikj/8212028/webrev.01/index.html (From ihse-runtestprebuilt-branch in jdk-sandbox) In order to fully adopt the new run-test framework, we need to switch over the automated and distributed testing system at Oracle to the new framework. To get this to work, there are number of issues that needed to be fixed. Here follows a brief explanation, see bug for more details. For RunTest.gmk and related makefiles there are a number of minor tweaks to support all the necessary control variables that are currently used for the old test makefiles, as well as correcting some test setup settings. In addition to that, some tests also needed to be modified: Timeouts The current default timeoutFactor in the makefiles is 4. However, the old Mach5 executor overrides that to 10. I don't think it should dabble with such things and leave it to the makefiles, the user, or a specific job definition, so with the new run-test executor, it no longer does. This means many tests now have a much shorter effective timeout. Because of this, we need to increase the timeout on some that are now prone to timing out. I have run tier1-5 a few times to try and find these and added /timeout=300 (which will result in the same effective timeout as before) when specific tests seemed problematic. This should be fixed in the tier job definitions not the individual tests. We have moved away from putting explicit timeouts on individual tests and instead rely on the framework timeout being set appropriately. David ----- David, That's a suboptimal policy. because it means you're relying on the framework handling the worst case test.

Yes. Given we have such a huge range of tests running on a range of platforms, on machines with a range of capabilities, using a range of VM flags and using a range of loads on the test machines, this has to be punted to the framework - otherwise you have to update every test to add an explicit timeout for the worst case (as experienced by some runner of the tests).

There's no holy-grail answer here.

My understanding of current approach was to set the framework timeout so that the majority of tests running under a given "normal" execution context pass. Then add multipliers for specific test configurations or platforms known to take longer (-Xcomp or sparc, for example). Then tests that don't fit within that chosen timeout get either their own timeout set, or moved to a tier with a different multiplier.

This change basically lowers the bar that had been set such that more tests now need explicit timeouts. I'm not sure why that was necessary, nor do I think it necessarily a good thing.

But after some internal discussions the test folk seem to be okay with this, so having said my piece I'll let it drop.

As far as jtreg goes, the default timeout for each step is 2 mins, which is intended to be enough for the test to reliably run within that time on a reasonably modern developer-class machine.  A test which always times out on a good machine should use a test-specific increased timeout.

Agreed.

Where the framework can help is, if tests are being run on an old or slow machine, or if test run args are provided that will cause the test to run significantly slower than usual, then the framework can/should start scaling up the timeout factor.

Again agreed.

Cheers, David

-- Jon

test/hotspot/jtreg/runtime/appcds/jvmti/InstrumentationTest.java This test spawns a child process and tries to locate it using the attach api, by looking for a unique token in the command line string of the spawned JVM. The problem is that the command line string it gets from the attach api is truncated and the token is last on the command line. This normally works well, but the arguments before it are 3 files, with full absolute paths inside the jtreg work directory. With Mach5 we have pretty deep work directories, and with run-test, we make them even deeper. This unfortunately trips the limit and the test fails. I have fixed this by reordering the arguments to the child process.

/Erik



More information about the build-dev mailing list