[llvm-dev] Noncapture use of locals disabling TailRecursionElimination (original) (raw)

Nick Lewycky via llvm-dev llvm-dev at lists.llvm.org
Fri May 8 16:38:22 PDT 2020


On 2020-05-08 2:58 p.m., Xun Li wrote:

Eli, Yes I was referring to AllCallsAreTailCalls. I will take a look at how to improve this.

Nick, Thanks. I agree that's the proper constrain to mark a call as tailcall, however not being able to mark a call as tailcall shouldn't completely kill TCE. (i.e. AllCallsAreTailCalls seems overly limiting).

I get it now. TailRecursionElimination.cpp does two optimizations, marking of tail and converting recursion to loops. I thought you were proposing a change to the marking of tail. Thanks for the example!

"PR962" refers to "problem report 962" or https://llvm.org/PR962 . "rdar" is Apple's "radar" bug tracking system, these are generally internal to their company but sometimes they're available in whole or in part on https://openradar.appspot.com/ .

Nick

Specifically, I have been looking into an issue where Clang cannot TCE the following code:

class Foo { public: attribute((noinline)) ~Foo() {} }; void callee() { Foo f; } void funcRecurse() { callee(); funcRecurse(); } int main() { funcRecurse(); } In the above code, callee will be inlined into funcRecurse, generating calls to ~Foo() along with lifetime management intrinsics, all using locals, disabling TCE. Clang can successfully TCE if we force callee to not inline. On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote: On 2020-05-08 1:34 p.m., Xun Li wrote:

Hi,

I was looking into the implementation of TailRecursionElimination, and noticed that we have the constrain that if any call uses a local, even though it doesn't capture the local, it would still prohibit TCE. This contain seems unnecessary and overly limiting? I think it's a necessary limitation. The idea is that something marked 'tail' could be emitted as a jump instruction instead of a call instruction. In order to do that, the jumpee has to 'ret' to the same place our function would have returned to. Since the return value is the current top-of-the-stack value when returning[1], the jumper must remove any local variables -- in LLVM, allocas -- from the stack before jumping. This implies that the jumpee may not access our locals at all, it is not sufficient for the callee to merely not capture them. Further to that point, we have an optimization in BasicAA where we conclude that a call marked "tail" does not alias any allocas of the calling function. There's a missed optimization here regarding "notail". Because "tail" can be used as an optimization for BasicAA, we should actually permit functions to be marked "tail" (not accessing any of the callers stack) and still be marked "notail" (the machine code must be a call, not a jump). These aren't actually mutually exclusive, even though they sound like they are. Nick [1] - not true on Windows, but even there the number of bytes deep the return value is, is a fixed constant that is determined entirely by the callee. The jumper can't have extra bytes on the stack at the moment of the jump. Relevant code is here: https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 Looking through the tests that cover this scenario (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), I found it referring to rdar://14324281 and PR962. What are they? These haven't been updated since 2014, so I wonder what is the latest state and have them been resolved? cc nicholas who authored the original code. Thanks!



More information about the llvm-dev mailing list