Fwd: Tiered compilation and virtual call heuristics (original) (raw)
Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Jul 30 02:08:56 UTC 2015
- Previous message: Tiered compilation and virtual call heuristics
- Next message: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be [0..4]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Carsten,
The main issue here is that without Tiered Interpreter starts collection profiling information only after 3300 invocations (InterpreterProfilePercentage). As result data from first invocations is not recorded. On other hand with Tiered C1 compilation (with profiling code) is triggered after 100 invocations. So you have a lot more data as you observed.
If you can sacrifice a startup performance you can try to use CompileThresholdScaling to increase compilation thresholds to delay compilations.
Or you can also try to increase Tier3InvocationThreshold and Tier3CompileThreshold to delay only C1 compilation:
Here is formula from simpleThresholdPolicy.inline.hpp:
return (i >= Tier3InvocationThreshold * scale) ||
(i >= Tier3MinInvocationThreshold * scale && i + b >=
Tier3CompileThreshold * scale);
But if you have real "flat" profile (all called methods are relatively warm) nothing will help you.
If you have some methods which are relatively hot you can solve that by trying to call them at the beginning. For example, if you had count400(0) called first (or second) you will get record for it in MDO. And then you can try to low TypeProfileMajorReceiverPercent to avoid virtual call at least for on hot method (recorded in MDO):
product(intx, TypeProfileMajorReceiverPercent, 90, "% of major receiver type to all profiled receivers")
Regards, Vladimir
On 7/22/15 10:37 AM, Carsten Varming wrote:
Dear Hotspot compiler group,
I have had a few issues with tiered compilation in JDK8 lately and was wondering if you have some comments or ideas for the given problem. Here is my problem as I currently understand it. Feel free to correct any misunderstandings I may have. With tiered compilation the heuristics for inlining virtual calls seems to degrade quite a bit. I think this is due to MethodData objects being created much earlier with tiered than without. This causes the tracking of the hottest target methods at a virtual call site to go awry, due to the limit (2) on the number of MethodData objects that can be associated with a bci in a method. It seems like the only virtual call targets tracked are the targets that are warm when when C1 is invoked. The program ends up with all call-sites in scala.collection.IndexedSeqOptimized.slice using virtual dispatch with tiered and bimorphic call sites without tiered. The end result with tiered is a tripling of the cpu required to run the program, and instruction pointers from the compiled slice method end up in 90% of all cpu samples (collected with perf at 4kHz). The problem is with a small application built in Scala on top of Netty. I have written a small sample program (see attached Main.java) to spare you the details (and to be able to give you code). When I run the sample program with tiered then the call to count end up being a virtual call, due to Instance$3.count and Instance4.count being warm when C1 kicks in. Without tiered Instance$1.count is the only hot method. I wonder if you guys have seen this problem in the wild or if I just happen to be unlucky. Increasing BciProfileWidth should help in my case, but it is not a product flag. Do you have any experience regarding cost of increasing BciProfileWidth? Do you have any thoughts on throwing out MethodData objects for virtual call sites that turns out to be pretty cold? Thank you in advance for your thoughts, Carsten
- Previous message: Tiered compilation and virtual call heuristics
- Next message: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be [0..4]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the hotspot-compiler-dev mailing list