[flang][OpenMP] Upstream do concurrent loop-nest detection. by ergawy · Pull Request #127595 · llvm/llvm-project (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This was referenced
Mar 4, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request
This PR starts the effort to upstream AMD's internal implementation of do concurrent to OpenMP mapping. This replaces llvm#77285 since we extended this WIP quite a bit on our fork over the past year.
An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.
In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.
This looks like a huge PR but a lot of the added stuff is documentation.
It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.
PR stack:
- llvm#126026 (this PR)
- llvm#127595
- llvm#127633
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request
Upstreams the next part of do concurrent to OpenMP mapping pass (from AMD's ROCm implementation). See llvm#126026 for more context.
This PR add loop nest detection logic. This enables us to discover muli-range do concurrent loops and then map them as "collapsed" loop nests to OpenMP.
This is a follow up for llvm#126026, only the latest commit is relevant.
This is a replacement for llvm#127478 using a /user/<username>/<branchname> branch.
PR stack:
- llvm#126026
- llvm#127595 (this PR)
- llvm#127633
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request
…ructs (llvm#127633)
Upstreams one more part of the ROCm do concurrent to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: omp parallel do. Towards that end, we have to collect more information about loop nests for which we add new utils in the looputils name space.
PR stack:
- llvm#126026
- llvm#127595
- llvm#127633 (this PR)
- llvm#127634
- llvm#127635
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request
Extends do concurrent mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.
PR stack:
- llvm#126026
- llvm#127595
- llvm#127633
- llvm#127634
- llvm#127635 (this PR)
ergawy added a commit that referenced this pull request
This PR starts the effort to upstream AMD's internal implementation of
do concurrent to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.
An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.
In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.
This looks like a huge PR but a lot of the added stuff is documentation.
It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.
PR stack:
Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See #126026 for more context.
This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request
…ping (#126026)
This PR starts the effort to upstream AMD's internal implementation of
do concurrent to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.
An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.
In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.
This looks like a huge PR but a lot of the added stuff is documentation.
It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.
PR stack:
- llvm/llvm-project#126026 (this PR)
- llvm/llvm-project#127595
- llvm/llvm-project#127633
- llvm/llvm-project#127634
- llvm/llvm-project#127635
ergawy deleted the users/ergawy/upstream_do_concurrent_2 branch
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request
ergawy added a commit that referenced this pull request
…ructs (#127633)
Upstreams one more part of the ROCm do concurrent to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: omp parallel do. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the looputils name space.
PR stack:
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request
ergawy added a commit that referenced this pull request
Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost fir.do_loop op in the
nest.
PR stack:
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request
ergawy added a commit that referenced this pull request
Extends do concurrent mapping to handle "loop-local values". A
loop-local value is one that is used exclusively inside the loop but
allocated outside of it. This usually corresponds to temporary values
that are used inside the loop body for initialzing other variables for
example. After collecting these values, the pass localizes them to the
loop nest by moving their allocations.
PR stack:
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request