[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute by yashssh · Pull Request #83381 · llvm/llvm-project (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation77 Commits1 Checks5 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

yashssh

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

  1. SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does.
  2. Phi node eventually gets translated into select instruction.
    Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Closes #51601

Co-authored-by: Sushant Gokhale sgokhale@nvidia.com

@llvmbot

@llvm/pr-subscribers-llvm-transforms

Author: Yashwant Singh (yashssh)

Changes

Based on these C/C++ patterns when compiled with 'Ofast' return X > 0.0 ? X : -X;
return X < 0.0 ? -X : X;

InstCombine tries to propogate FMF to 'select' instructions before attempting a fold, but it can't safely propgate 'nsz' hence wasn't performing the optimization. OOTH we should be able to do this optimization at 'Ofast' same as gcc(https://godbolt.org/z/c69fe5fa6).

Bit of a workaround but this patch allows us to query the "no-signed-zeroes" function attribute added to the function during 'Ofast' compilation. Allowing instcombine to safely match the idiom.


Full diff: https://github.com/llvm/llvm-project/pull/83381.diff

2 Files Affected:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp index 71fa9b9ba41ebb..bd56d92d5d3d0f 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp @@ -2742,7 +2742,11 @@ static Instruction *foldSelectWithFCmpToFabs(SelectInst &SI, // Note: We require "nnan" for this fold because fcmp ignores the signbit // of NAN, but IEEE-754 specifies the signbit of NAN values with // fneg/fabs operations.

diff --git a/llvm/test/Transforms/InstCombine/fabs.ll b/llvm/test/Transforms/InstCombine/fabs.ll index 7e380c2e4590a0..88b02a852f3d74 100644 --- a/llvm/test/Transforms/InstCombine/fabs.ll +++ b/llvm/test/Transforms/InstCombine/fabs.ll @@ -547,6 +547,20 @@ define double @select_fcmp_nnan_nsz_ult_zero_unary_fneg(double %x) { ret double %fabs }

@@ -839,6 +853,19 @@ define <2 x float> @select_fcmp_nnan_nsz_ugt_zero_unary_fneg(<2 x float> %x) { ret <2 x float> %fabs }

+define float @absfloat32f_ogt_fast_no_signed_zeroes(float %x) "no-signed-zeros-fp-math" { +; CHECK-LABEL: @absfloat32f_ogt_fast_no_signed_zeroes( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[RETVAL_0:%.]] = call nnan ninf float @llvm.fabs.f32(float [[X:%.]]) +; CHECK-NEXT: ret float [[RETVAL_0]] +; +entry:

@github-actions GitHub Actions

✅ With the latest revision this PR passed the C/C++ code formatter.

@yashssh

The original inspiration for the missed optimization can be seen here https://godbolt.org/z/z3Es5oxqv.
This seemed to be the most straightforward way to achieve this optimization and a detailed discussion about the issue and why this fold was skipped by 'instcombine' is documented here #51601

goldsteinn

goldsteinn

andykaylor

@dtcxzyw dtcxzyw linked an issue

Mar 1, 2024

that may beclosed by this pull request

nikic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

arsenm

arsenm

@dtcxzyw

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

TBH I think fast-math flags should only be function-level attributes :(

@yashssh

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

TBH I think fast-math flags should only be function-level attributes :(

Will that be more efficient? What about function inlining when we end up losing these attributes?

@yashssh

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

That can be done but will be a separate work altogether, my understanding is somewhere early in the pipeline. I was also suggested that it would require allowing loads and stores to have fast flags so that they are safely reflected in phis and selects.

@dtcxzyw

What about function inlining when we end up losing these attributes?

Is there any real-world project whose translation units/functions are compiled with different fast math flags?

@andykaylor

What about function inlining when we end up losing these attributes?

Is there any real-world project whose translation units/functions are compiled with different fast math flags?

We have pragmas that allow you to turn fast-math on an off locally, at a scope level, so even within a function the fast-math flags can change. I think this is an important capability. One of the top concerns I hear from customers who are using fast-math is that they need the ability to track down optimizations that are throwing their results too far out of range. A first step to tracking this down is to stop using fast-math for individual files or sets of files to bisect the problem down to which file is causing a failure. From there, it gets harder, but pragmas can help.

I'd like to introduce a change to the way we check fast-math flags that would let us insert some kind of debug counter to narrow it down in a way that would be similar to how opt-bisect works but effective at the individual transformation granularity. There are some obstacles to making that happen, but I think that would be an extremely beneficial capability. Once you find the place where an accuracy-related failure is being introduced, my idea is to have some way (probably based on pragmas) to turn fast-math off for that location only in the code without needing the debug counter. That's a bit of a pipe dream at this point, but I want to see us working towards that.

This is one of the reasons I don't like fast-math function attributes, but the way things currently work is that the attribute isn't set if the relevant fast-math flag is turned off anywhere in the function. The front end handles this with pragmas, but it means that the inliner needs to remove the attribute if inlining calls where the callee and caller don't both have it set.

@andykaylor

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

The problem is in deciding where to do that. I don't like the function attributes and have made a few attempts to have them deprecated, but there are cases that couldn't be solved without them. This case, as detailed in #51601, is one such example. The front end can't set the fast-math flags on load instructions, so they never make it to the phi instruction that gets created. Propogating the flags from the attribute to instructions wouldn't help unless it was done after SROA and/or SimplifyCFG, but maybe those passes could add the flag based on the function-level attribute. That might be a better way to fix this.

@yashssh

I'm leaving this pull request open for now while simultaneously looking if I can set FMF in SROA, will return back with whatever I find.

@andykaylor

I'm leaving this pull request open for now while simultaneously looking if I can set FMF in SROA, will return back with whatever I find.

This is obviously a contrived example, but here is a case that will not be optimized if you rely on the function attributes but could be optimized if the flags were propagated by SROA.

https://godbolt.org/z/senb3bEbn

@yashssh

Thanks for the example @andykaylor!

I tried a change that sets fast flag on phi nodes created by SROA

diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 88b05aab8db4..1d85c7e0d8a6 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -1056,6 +1056,22 @@ bool PromoteMem2Reg::QueuePhiNode(BasicBlock *BB, unsigned AllocaNo,
   // BasicBlock.
   PN = PHINode::Create(Allocas[AllocaNo]->getAllocatedType(), getNumPreds(BB),
                        Allocas[AllocaNo]->getName() + "." + Twine(Version++));
+
+  if(BB->getParent()->hasFnAttribute("unsafe-fp-math")){
+    PN->setFast(true);
+  }
+
   PN->insertBefore(BB->begin());
   ++NumPHIInsert;
   PhiToAllocaMap[PN] = AllocaNo;

Is this the direction you were hinting towards?

@arsenm

Should add the Fixes: issue number to the description

@sushgokh

@arsenm

I do think we should have the phase ordering test

@sushgokh

I do think we should have the phase ordering test

Although I didn't have a PhaseOrdering test, I had initially added a test that showed the transformation after the concerned passes here and @jcranmer-intel insisted there that this is not required.

@jcranmer-intel

I do think we should have the phase ordering test

Although I didn't have a PhaseOrdering test, I had initially added a test that showed the transformation after the concerned passes here and @jcranmer-intel insisted there that this is not required.

The right way to do a phase ordering test is not the way you did it. There's a folder test/Transforms/PhaseOrdering directory where the phase-ordering tests go, and in general, you would probably want to check a test like opt -O1 to make sure that the passes are actually run in the order expected.

arsenm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great but was the consensus on the ticket. The phase orderingtest could use a comment and rename though

@sushgokh

…using function attribute

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does. Phi node eventually gets translated into select instruction. Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic. This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Co-authored-by: Sushant Gokhale sgokhale@nvidia.com

arsenm

@llvm-ci

LLVM Buildbot has detected a new failure on builder sanitizer-ppc64le-linux running on ppc64le-sanitizer while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/683

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
PASS: MemorySanitizer-POWERPC64LE :: recover-dso.cpp (2428 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: heap-origin.cpp (2429 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: check_mem_is_initialized.cpp (2430 of 2450)
XFAIL: MemorySanitizer-POWERPC64LE :: vararg_shadow.cpp (2431 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin2.cpp (2432 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/138/267 (2433 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: max_allocation_size.cpp (2434 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin.cpp (2435 of 2450)
PASS: ThreadSanitizer-powerpc64le :: bench_threads.cpp (2436 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: recover.cpp (2437 of 2450)
FAIL: ThreadSanitizer-powerpc64le :: signal_reset.cpp (2438 of 2450)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_reset.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:73:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744) 
not:73     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 27 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 (signal_reset.cpp.tmp+0x1019d0) 
        5:  
        6: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 in handler(int) 
        7: ================== 
        8: DONE 
        9: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

********************
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
...
PASS: MemorySanitizer-POWERPC64LE :: recover-dso.cpp (2428 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: heap-origin.cpp (2429 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: check_mem_is_initialized.cpp (2430 of 2450)
XFAIL: MemorySanitizer-POWERPC64LE :: vararg_shadow.cpp (2431 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin2.cpp (2432 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/138/267 (2433 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: max_allocation_size.cpp (2434 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin.cpp (2435 of 2450)
PASS: ThreadSanitizer-powerpc64le :: bench_threads.cpp (2436 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: recover.cpp (2437 of 2450)
FAIL: ThreadSanitizer-powerpc64le :: signal_reset.cpp (2438 of 2450)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_reset.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:73:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744) 
not:73     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 27 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 (signal_reset.cpp.tmp+0x1019d0) 
        5:  
        6: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 in handler(int) 
        7: ================== 
        8: DONE 
        9: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

********************

@llvm-ci

LLVM Buildbot has detected a new failure on builder lldb-aarch64-windows running on linaro-armv8-windows-msvc-05 while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/141/builds/428

Here is the relevant piece of the build log for the reference:

Step 6 (test) failure: build (failure)
...
30.698 [1/2/288] Linking CXX executable tools\lldb\unittests\Thread\ThreadTests.exe
30.814 [1/1/289] Linking CXX executable tools\lldb\unittests\ValueObject\LLDBValueObjectTests.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:57: note: using lit tools: C:\Users\tcwg\scoop\apps\git\2.39.0.windows.2\usr\bin
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using clang: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using ld.lld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using lld-link: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lld-link.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using ld64.lld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld64.lld.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using wasm-ld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\wasm-ld.exe
30.814 [0/1/289] Running lldb-- Testing: 1983 tests, 2 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp (1616 of 1983)
******************** TEST 'lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 19
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf    -fdebug-types-section -gpubnames -c C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe' --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -fdebug-types-section -gpubnames -c 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o'
# note: command had no output on stdout or stderr
# RUN: at line 21
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT    -fdebug-types-section -gpubnames -c C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe' --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT -fdebug-types-section -gpubnames -c 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o'
# note: command had no output on stdout or stderr
# RUN: at line 23
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp'
# .---command stderr------------
# | ld.lld: warning: cannot find entry symbol _start; not setting start address
# `-----------------------------
# RUN: at line 26
rm -f C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.dwp
# executed command: rm -f 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.dwp'
# note: command had no output on stdout or stderr
# RUN: at line 27
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe --no-lldbinit -S C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet    -o "type lookup IntegerType"    -o "type lookup FloatType"    -o "type lookup CustomType"    -b C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp | c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp --check-prefix=NODWP
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe' --no-lldbinit -S 'C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet' -o 'type lookup IntegerType' -o 'type lookup FloatType' -o 'type lookup CustomType' -b 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp'
# note: command had no output on stdout or stderr
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' --check-prefix=NODWP
# .---command stderr------------
# | C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp:35:11: error: NODWP: expected string not found in input
# | // NODWP: (lldb) type lookup FloatType
# |           ^
# | <stdin>:20:22: note: scanning from here
# |  typedef unsigned int IntegerType;
# |                      ^
# | <stdin>:21:3: note: possible intended match here
# |  typedef float FloatType;
# |   ^

nikic added a commit to nikic/llvm-project that referenced this pull request

Jul 2, 2024

@nikic

llvm#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors.

I believe this was intended as a compile-time optimization in the first place, so just drop the check, and always try to transfer the attribute.

nikic added a commit that referenced this pull request

Jul 2, 2024

@nikic

#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors.

Remove the isComplete() check and instead cache the attribute lookup, to only perform it once per function. Actually setting the FMF flag is cheap.

lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request

Jul 3, 2024

…using function attribute (llvm#83381)

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

  1. SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does.
  2. Phi node eventually gets translated into select instruction. Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Closes llvm#51601

Co-authored-by: Sushant Gokhale sgokhale@nvidia.com

lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request

Jul 3, 2024

@nikic @lravenclaw

llvm#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors.

Remove the isComplete() check and instead cache the attribute lookup, to only perform it once per function. Actually setting the FMF flag is cheap.

kbluck pushed a commit to kbluck/llvm-project that referenced this pull request

Jul 6, 2024

…using function attribute (llvm#83381)

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

  1. SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does.
  2. Phi node eventually gets translated into select instruction. Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Closes llvm#51601

Co-authored-by: Sushant Gokhale sgokhale@nvidia.com

kbluck pushed a commit to kbluck/llvm-project that referenced this pull request

Jul 6, 2024

@nikic @kbluck

llvm#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors.

Remove the isComplete() check and instead cache the attribute lookup, to only perform it once per function. Actually setting the FMF flag is cheap.

@nunoplopes

Alive2 complains about this commit:

define double @fabs_fcmp_olt_nsz_func_attr(double %#0, double %#1) { entry: %x = alloca i64 8, align 8 store double %#0, ptr %x, align 8 %cmp = fcmp nnan nsz olt double %#0, 0.000000 br i1 %cmp, label %if.then, label %return

if.then: %fneg = fneg nnan nsz double %#0 store double %fneg, ptr %x, align 8 br label %return

return: %ret = load double, ptr %x, align 8 ret double %ret } => define double @fabs_fcmp_olt_nsz_func_attr(double %#0, double %#1) { entry: %cmp = fcmp nnan nsz olt double %#0, 0.000000 br i1 %cmp, label %if.then, label %return

if.then: %fneg = fneg nnan nsz double %#0 br label %return

return: %x.0 = phi nsz double [ %fneg, %if.then ], [ %#0, %entry ] ret double %x.0 } Transformation doesn't verify! (unsound) ERROR: Target's return value is more undefined

Example: double %#0 = #x0000000000000000 (+0.0) double %#1 = poison

Source: ptr %x = pointer(local, block_id=0, offset=0) i1 %cmp = #x0 (0)

Jump to %return double %ret = #x0000000000000000 (+0.0)

Target: i1 %cmp = #x0 (0)

Jump to %return double %x.0 = #x8000000000000000 (-0.0) Source value: #x0000000000000000 (+0.0) Target value: #x8000000000000000 (-0.0)

@nikic

@nunoplopes As far as I can tell, alive2 does not have support for the no-signed-zeros-fp-math attribute, so I think that's expected.

@nunoplopes

@nunoplopes As far as I can tell, alive2 does not have support for the no-signed-zeros-fp-math attribute, so I think that's expected.

Ah, thanks, I missed that.
To my defense, it's not documented in LangRef 😅

pawosm-arm added a commit to pawosm-arm/llvm-project that referenced this pull request

Oct 31, 2024

@pawosm-arm

…he nsz flag

Following the change introduced by the PR llvm#83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.

pawosm-arm added a commit to pawosm-arm/llvm-project that referenced this pull request

Oct 31, 2024

@pawosm-arm

…he nsz flag

Following the change introduced by the PR llvm#83381, this patch extends it with the same treatment of the nnan fast-math flag. This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of relevant fast-math flags.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan flag down to these Phi nodes.

pawosm-arm added a commit to pawosm-arm/llvm-project that referenced this pull request

Nov 5, 2024

@pawosm-arm

…par with the nsz flag

Following the change introduced by the PR llvm#83381, this patch extends it with the same treatment of the nnan and ninf fast-math flags.

This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of the relevant fast-math flag.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan and ninf flags down to these Phi nodes.

pawosm-arm added a commit to pawosm-arm/llvm-project that referenced this pull request

Nov 5, 2024

@pawosm-arm

…par with the nsz flag

Following the change introduced by the PR llvm#83381, this patch extends it with the same treatment of the nnan and ninf fast-math flags.

This is to address the performance drop caused by PR#83200 which prevented vital InstCombine transformation due to the lack of the relevant fast-math flag.

The PromoteMem2Reg utility is used by the SROA pass, where Phi nodes are being created. Proposed change allows propagation of the nnan and ninf flags down to these Phi nodes.