RFR(L): 8074981 (Integer/FP scalar reduction optimization ) (original) (raw)

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Mar 31 02:50:26 UTC 2015


On 3/30/15 7:20 PM, Berg, Michael C wrote:

Almost, it's more than that, there are missing components in long support in AVX2, so we only allow what superword can currently process safely and bypass the question of long support for reductions until AVX3, where support is complete enough to allow those forms of reductions.

Okay.

Nils was the initial reviewer and sponsor, so Nils can you make another pass and comment on the current webrev for the review.

Nils is out for few days. Christian looked on this too, let him do second review.

Thanks, Vladimir

Thanks, -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, March 30, 2015 6:47 PM To: Berg, Michael C Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) Here is updated webrev which addressed these and other issues: http://cr.openjdk.java.net/~kvn/8074981/webrev.01/ Michael, I noticed that .ad file does not have matched instructions for AddReductionVL. I assume it is because there is no avx3 yet. Right? Otherwise this look good to me. You need second review from an other Reviewer since changes are big. Thanks, Vladimir On 3/25/15 4:00 PM, Vladimir Kozlov wrote: Please, ignore previous email. I screwed up Michael's email address.

Hi Michael, I have few major concerns which you need to address. Adding new field attr to Node class should be avoided - it will increase significantly memory footprint of graph and not be used frequently (vectorization is rare case). NodeFlags has only 16 bits and you used 2. And I don't see how Flagisloopcarrieddep is used. All above goes to one question: why markreductions() is executed in loopopts before each unroll and not during superword processing? If you do markreductions() in superword you can use VectorSet to indicate nodes which are reduction nodes. And the same for attr. Why to store alignment in Node and not use nodeinfo in packseteval()? Small note. Instead of: + Node *defNode = n->in(len - 1); use: + Node *defNode = n->in(LoopNode::LoopBackControl); Thanks, Vladimir

On 3/25/15 1:09 PM, Berg, Michael C wrote: Christian/Nils: Any additional comments for the review, if not Thursday I will upload the final webrev with the requested change. Thanks, -Michael *From:*Berg, Michael C Sent: Thursday, March 19, 2015 5:55 PM To: Christian Thalinger Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) Christian, yes we could rely on the base class definitions instead, since we are not augmenting arguments. I will remove the file changes after the review concludes in case there are any other modifications. Thanks, -Michael *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Thursday, March 19, 2015 3:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) On Mar 19, 2015, at 3:23 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote: I have updated the webrev contents after some feedback(with no code changes), and Vladimir has placed it in location everyone can access. Anyone should be able to apply the patch or review the code from this info: http://cr.openjdk.java.net/~kvn/8074981/webrev.00/ src/cpu/x86/vm/macroAssemblerx86.hpp: Why do we need these methods? MacroAssembler extends Assembler.

this replaces the JBS version of the webrev files for 8074981. Thanks, -Michael -----Original Message----- From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] Sent: Thursday, March 19, 2015 12:55 AM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) Michael, I've got it, thank you for explanation. Regards, Filipp. On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote: Filipp, for large iteration loops, if I am taking your meaning correctly, you could not do that without splitting the loop and re-architecting it into a loop nest pair to manage the reduction components. Seems like the overhead from that scenario could create cost issues where reductions could actually hamper performance in small vector expressions. Right now we never degrade and generally benefit with the implementation as it stands with the reductions stitched into the vector unit computations directly. Regarding sub/div/etc: For now we have waived off on non-commuting operations like sub and div, they would have to be very strictly managed via pack-set placement. But the answer is yes we could support them. Thanks, -Michael -----Original Message----- From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] Sent: Wednesday, March 18, 2015 2:20 AM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) Hi Michael, thank you for contributing such a great improvement! Sorry if my question is silly, but I'm curious wouldn't it be better to replace integer scalar reduction variable with a vector "Rv" in loop's prologue, compile loop's body as a regular vectorized addition/multiplication, and reduce "Rv" to a scalar in loop's epilogue? Why you didn't add SubReduction* nodes? Best regards, Filipp. On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote: Hi All,

We would like to contribute the Integer/FP scalar reduction optimization from Intel. The contribution is referenced as Bug ID 8074981 as a performance enhancement. Please review this patch: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip The optimization achieves as much as 2.3x on integer reductions and supports float and double precision optimizations which also have significant optimization uplift an obey strict fp constraints. Nils Eliasson has offered to sponsor this patch. Thanks, -Michael



More information about the hotspot-compiler-dev mailing list