Discussion: 8172978: Remove Interpreter TOS optimization (original) (raw)
Doerr, Martin martin.doerr at sap.com
Fri Feb 24 16:40:39 UTC 2017
- Previous message: Discussion: 8172978: Remove Interpreter TOS optimization
- Next message: Discussion: 8172978: Remove Interpreter TOS optimization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Max,
thank you very much for sharing your results and for sending the patch.
I guess it covers the most relevant cases, but not all ones. I think it'd be better to modify dispatch_next instead of dispatch_epilog on x86. (dispatch_next is also used by generate_return_entry_for and generate_deopt_entry_for.)
On s390, I'm using dispatch_next with: if (!EnableTosCache) { push(state); state = vtos; } dispatch_base(state, Interpreter::dispatch_table(state));
I also added an assertion to dispatch_base in order to make sure I'm hitting all dispatch usages: assert(EnableTosCache || state == vtos, "sanity");
Unfortunately, the performance results of SPEC jvm98 with -Xint seem to drop significantly with -XX:-EnableTosCache on both, PPC64 and s390. But we need to perform more measurements to get more reliable results.
Best regards, Martin
-----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Max Ockner Sent: Donnerstag, 23. Februar 2017 22:21 To: hotspot-dev at openjdk.java.net Subject: Re: Discussion: 8172978: Remove Interpreter TOS optimization
Hi Volker, I have attached the patch that I have been testing. Thanks, Max
On 2/20/2017 5:45 AM, Volker Simonis wrote:
Hi,
besides the fact that this of course means some work for us :) I currently don't see any problems for our porting platforms (ppc64 and s390x). Are there any webrevs available, so we can see how big they are and maybe do some own benchmarking? Thanks, Volker
On Sun, Feb 19, 2017 at 11:11 PM, <coleen.phillimore at oracle.com> wrote:
On 2/18/17 11:14 AM, coleen.phillimore at oracle.com wrote: When Max gets back from the long weekend, he'll post the platforms in your bug.
It's amazing that for -Xint there's no significant difference. I've seen -Xint performance of 15% slower cause a 2% slowdown with server but that was before tiered compilation. I should clarify this. I've seen this slowdown for different interpreter optimizations, which can affect server performance. I was measuring specjvm98 on linux x64. If there's no significant difference for this TOS optimization, there is no chance of a degredation in overall performance. Coleen The reason for this query was to see what developers for the other platform ports think, since this change would affect all of the platforms. Thanks, Coleen On 2/18/17 10:50 AM, Daniel D. Daugherty wrote: If Claes is happy with the perf testing, then I'm happy. :-)
Dan
On 2/18/17 3:46 AM, Claes Redestad wrote: Hi, I've seen Max has run plenty of tests on our internal performance infrastructure and everything I've seen there seems to corroborate the idea that this removal is OK from a performance point of view, the footprint improvements are small but significant and any negative performance impact on throughput benchmarks is at noise levels even with -Xint (it appears many benchmarks time out with this setting both before and after, though; Max, let's discuss offline how to deal with that :-)) I expect this will be tested more thoroughly once adapted to all platforms (which I assume is the intent?), but see no concern from a performance testing point of view: Do it! Thanks! /Claes On 2017-02-16 16:40, Daniel D. Daugherty wrote: Hi Max, Added a note to your bug. Interesting idea, but I think your data is a bit incomplete at the moment. Dan
On 2/15/17 3:18 PM, Max Ockner wrote: Hello all, We have filed a bug to remove the interpreter stack caching optimization for jdk10. Ideally we can make this change early during the jdk10 development cycle. See below for justification: Bug: https://bugs.openjdk.java.net/browse/JDK-8172978 Stack caching has been around for a long time and is intended to replace some of the load/store (pop/push) operations with corresponding register operations. The need for this optimization arose before caching could adequately lessen the burden of memory access. We have reevaluated the JVM stack caching optimization and have found that it has a high memory footprint and is very costly to maintain, but does not provide significant measurable or theoretical benefit for us when used with modern hardware. Minimal Theoretical Benefit. Because modern hardware does not slap us with the same cost for accessing memory as it once did, the benefit of replacing memory access with register access is far less dramatic now than it once was. Additionally, the interpreter runs for a relatively short time before relevant code sections are compiled. When the VM starts running compiled code instead of interpreted code, performance should begin to move asymptotically towards that of compiled code, diluting any performance penalties from the interpreter to small performance variations. No Measurable Benefit. Please see the results files attached in the bug page. This change was adapted for x86 and sparc, and interpreter performance was measured with Specjvm98 (run with -Xint). No significant decrease in performance was observed. Memory footprint and code complexity. Stack caching in the JVM is implemented by switching the instruction look-up table depending on the tos (top-of-stack) state. At any moment there are is an active table consisting of one dispatch table for each of the 10 tos states. When we enter a safepoint, we copy all 10 safepoint dispatch tables into the active table. The additional entry code makes this copy less efficient and makes any work in the interpreter harder to debug. If we remove this optimization, we will: - decrease memory usage in the interpreter, - eliminated wasteful memory transactions during safepoints, - decrease code complexity (a lot). Please let me know what you think. Thanks, Max
- Previous message: Discussion: 8172978: Remove Interpreter TOS optimization
- Next message: Discussion: 8172978: Remove Interpreter TOS optimization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]