[RFC][OpenMP] Early splitting of compound directives (original) (raw)

November 4, 2025, 6:19pm 1

Proposal

Break up compound directives into constituent directives in the AST (as a part of the AST canonicalization for OpenMP) instead of doing it during lowering to MLIR.

This is a change in implementation only.

Motivation

The canonicalization happens early in the semantic analysis phase. This is before the OpenMP structure checker runs, the pass that performs the semantic correctness checks of OpenMP directives. Having the compound directives broken up by that time would allow these checks to be more accurate.

Example 1

A compound directive (including any arguments and clauses) is legal iff the break-up and clause assignment can be done following the procedure described by the OpenMP spec. Currently the splitting happens late, after any chance for emitting a diagnostic message has passed and causes compilation to abort on failure. If done early, a diagnostic message could be emitted instead.

Example 2

The taskgraph construct has the following restriction

Any variable referenced in a replayable construct that does not have static storage duration and that does not exist in the enclosing data environment of the taskgraph construct must be a private-only or firstprivate variable in the replayable construct.

To determine the data-sharing attribute of a variable in a construct the compiler may need to know the set of clauses applied to the construct. If the construct is a part of a compound directive, the exact set of applicable clauses is not known until the splitting is performed.

Impact

The directive splitter will need to handle failures gracefully (instead of aborting).
The splitting code would be moved out of lowering and into semantics.
The existing semantic checks should be largely unaffected.
Unparsing will print the code after directive splitting.
Otherwise, the functionality would remain the same.

skatrak November 5, 2025, 12:53pm 2

I don’t have a strong opinion on this, but in principle it appears to me that it would indeed be a good idea to have the splitting be done as early as it makes sense to do, because of the reasons you listed and also so that we don’t have to use slightly different representations of the AST in semantics and lowering.

I’m not sure if information about the original source form is needed for anything other than unparsing, but a slightly different unparsed representation doesn’t seem too big a disadvantage. Perhaps semantics error messages, related to implicit clauses added by the directive splitting, might be difficult for users to address if we don’t clearly state where those clauses came from. But I imagine this is something we already don’t do.

tblah November 6, 2025, 10:20am 3

I also think this makes sense. It should definitely make the semantic checks easier to reason about. I don’t think it is a problem if the unparsed sources are different (but equivalent).