[flang][OpenMP] Upstream do concurrent loop-nest detection. by ergawy · Pull Request #127595 · llvm/llvm-project (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@ergawy

bhandarkar-pranav

bhandarkar-pranav

This was referenced

Mar 4, 2025

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request

Mar 27, 2025

@ergawy

…126026)

This PR starts the effort to upstream AMD's internal implementation of do concurrent to OpenMP mapping. This replaces llvm#77285 since we extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.

In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.

PR stack:

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request

Mar 27, 2025

@ergawy

…27595)

Upstreams the next part of do concurrent to OpenMP mapping pass (from AMD's ROCm implementation). See llvm#126026 for more context.

This PR add loop nest detection logic. This enables us to discover muli-range do concurrent loops and then map them as "collapsed" loop nests to OpenMP.

This is a follow up for llvm#126026, only the latest commit is relevant.

This is a replacement for llvm#127478 using a /user/<username>/<branchname> branch.

PR stack:

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request

Mar 27, 2025

@ergawy

…ructs (llvm#127633)

Upstreams one more part of the ROCm do concurrent to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: omp parallel do. Towards that end, we have to collect more information about loop nests for which we add new utils in the looputils name space.

PR stack:

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request

Mar 27, 2025

@ergawy

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request

Mar 27, 2025

@ergawy

…lvm#127635)

Extends do concurrent mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.

PR stack:

ergawy added a commit that referenced this pull request

Apr 2, 2025

@ergawy

This PR starts the effort to upstream AMD's internal implementation of do concurrent to OpenMP mapping. This replaces #77285 since we extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.

In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.

PR stack:

@ergawy

Upstreams the next part of do concurrent to OpenMP mapping pass (from AMD's ROCm implementation). See #126026 for more context.

This PR add loop nest detection logic. This enables us to discover muli-range do concurrent loops and then map them as "collapsed" loop nests to OpenMP.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request

Apr 2, 2025

@ergawy @github-actions

…ping (#126026)

This PR starts the effort to upstream AMD's internal implementation of do concurrent to OpenMP mapping. This replaces #77285 since we extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful.

In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers.

PR stack:

@ergawy ergawy deleted the users/ergawy/upstream_do_concurrent_2 branch

April 2, 2025 08:12

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request

Apr 2, 2025

@ergawy @github-actions

ergawy added a commit that referenced this pull request

Apr 2, 2025

@ergawy

…ructs (#127633)

Upstreams one more part of the ROCm do concurrent to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: omp parallel do. Towards that end, we have to collect more information about loop nests for which we add new utils in the looputils name space.

PR stack:

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request

Apr 2, 2025

@ergawy @github-actions

ergawy added a commit that referenced this pull request

Apr 2, 2025

@ergawy

…127634)

Adds support for converting mulit-range loops to OpenMP (on the host only for now). The changes here "prepare" a loop nest for collapsing by sinking iteration variables to the innermost fir.do_loop op in the nest.

PR stack:

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request

Apr 2, 2025

@ergawy @github-actions

ergawy added a commit that referenced this pull request

Apr 2, 2025

@ergawy

…127635)

Extends do concurrent mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations.

PR stack:

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request

Apr 2, 2025

@ergawy @github-actions