RFR 8080325 SuperWord loop unrolling analysis (original) (raw)

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Jun 5 19:55:07 UTC 2015


Looks good. I would invert not_slp variable to positive is_slp - if(is_slp) looks more readable.

Thanks, Vladimir

On 6/4/15 9:45 PM, Berg, Michael C wrote:

Vladimir, please find the following webrev with pretty much the full list of changes made. I made some improvements too. For instance I only allow the analysis to take place when we are trying to unroll beyond the default.

http://cr.openjdk.java.net/~mcberg/8080325/webrev.01/ Regards, -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, June 03, 2015 5:25 PM To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR 8080325 SuperWord loop unrolling analysis Thank you, Michael, for this contribution. First, I am fine with such approach - call superword only to collect data about loop. It could be useful for superword optimization to have a pass over loop's nodes to determine if it could be vectorize as separate phase. And avoid to do that in SLP analysis. Make SuperWordLoopUnrollAnalysis flag's default value to 'false' and set it to true only in vmversionx86.cpp (#ifdef COMPILER2) so that you don't need to modify setting on all platforms. In flag description say what 'slp' means (Superword Level Parallelism). Code style: if (cl->isreductionloop() == false) phase->markreductions(this); should be: if (!cl->isreductionloop()) { phase->markreductions(this); } An other one: cl->haspassedslp() == false. We use ! for such cases. There are following checks after new code which prevent unrolling. Why you do analysis before them without affecting their decision? And you use the result of analysis only later at the end of method. Why not do analysis there then? (localloopunrollfactor > 4) check should be done inside policyslpmaxunroll() to have only one check. Actually all next code (lines 668-692) could be done at the end of policyslpmaxunroll(). And you don't need to return bool then (I changed name too): 665 if (LoopMaxUnroll > localloopunrollfactor) { 666 // Once policyslpanalysis succeeds, mark the loop with the 667 // maximal unroll factor so that we minimize analysis passes 668 policyunrollslpanalysis(cl, phase); 693 } 694 } 695 696 // Check for initial stride being a small enough constant 697 if (abs(cl->stridecon()) > (1<<2)*futureunrollct) return false;_ _I think slp analysis code should be in superword.cpp - SWPointer should be used only there. Just add an other method to SuperWord class._ _Instead of a->Afree() and next: 810 Arena *a = Thread::current()->resourcearea(); 812 sizet ignoredsize = body.size()sizeof(int); 813 int ignoredloopnodes = (int)a->AmallocD(ignoredsize); 814 NodeStack nstack((int)ignoredsize); use: ResourceMark rm; sizet ignoredsize = body.size(); int *ignoredloopnodes = NEWRESOURCEARRAY(int, ignoredsize); NodeStack nstack((int)ignoredsize); NodeStack should take number of nodes and not bytes. I am concern about nstack.clear() because you have poped all nodes on it. Thanks, Vladimir On 5/13/15 6:26 PM, Berg, Michael C wrote: Hi Folks,

We (Intel) would like to contribute superword loop unrolling analysis to facilitate more default unrolling and larger SIMD vector mapping. Please review this patch and comment as needed: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8080325 webrev: http://cr.openjdk.java.net/~kvn/8080325/webrev/ The design finds the highest common vector supported and implemented on a given machine and applies that to unrolling, iff it is greater than the default. If the user gates unrolling we will still obey the user directive. It's light weight, when we get to the analysis part, if we fail, we stop further tries. If we succeed we stop further tries. We generally always analyze only once. We then gate the unroll factor by extending the size of the unroll segment so that the iterative tries will keep unrolling, obtaining larger unrolls of a targeted loop. I see no negative behavior wrt to performance, and a modest positive swing in default behavior up to 1.5x for some micros. Vladimir Koslov has offered to sponsor this patch.



More information about the hotspot-compiler-dev mailing list