[llvm-dev] structure-return tailcall (original) (raw)

Nathan Sidwell via llvm-dev llvm-dev at lists.llvm.org
Wed Aug 18 11:43:45 PDT 2021


I'm working on pr51000, and thinking about the case of large structures returned by artificial sret pointer parm.

I have questions.

The itanium ABI requires functions that return a large struct this way, to also return that pointer as their scalar return value. (Let's not get into the pros and cons of that, it is what it is. I'm looking at x86_64 primarily, but I understand ISAs have similar ABIs.)

Anyway, to do that requires some data-flow work, and being a newbie to llvm IR, I can see two ways to do this. It is not clear to me which is the easiest or best. Plus I find discrepancies between documentation, tests and implementation!

Consider:

struct Big { int ary[50]; Big (); };

Big Foo ();

Big Bar () { return Foo (); }

Here's the IR:

define dso_local void @_Z3Barv(%struct.Big* noalias sret(%struct.Big) align 4 %0) local_unnamed_addr #0 { tail call void @_Z3Foov(%struct.Big* sret(%struct.Big) align 4 %0) ret void }

I.e. the middle end figures this is tail callable, but I don't think it knows about the pointer return requirement (see below for evidence).

Test/documentation mismatch: The tailcall documentation says: (https://llvm.org/docs/LangRef.html#call-instruction) 'Both markers imply that the callee does not access allocas from the caller.'

However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems to break that. Specifically:

define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias sret(%struct.foo) %agg.result) nounwind { %a = alloca %struct.foo, align 8 tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %a) nounwind ret void }

That call to t21_f_sret is referencing the frame-allocated %a object.

Question: Is sibcall.ll correct or not?

Implementation/documentation mismatch: I also note that the tail marker can appear even when the call is NOT the last (real) instruction in the function. That seems strange.

The documentation says: 'The optional tail and musttail markers indicate that the optimizers should perform tail call optimization.'

Consider: struct Big { int ary[50]; Big (); };

void Frob ();

Big Baz () { Big b; Frob (); return b; }

this generates:

define dso_local void @_Z3Bazv(%struct.Big* noalias nonnull sret(%struct.Big) align 4 %0) local_unnamed_addr #0 { tail call void @_ZN3BigC1Ev(%struct.Big* nonnull dereferenceable(200) %0) tail call void @_Z4Frobv() ret void }

We can tail call Frob, but not Big's constructor. Why is the ctor marked as tailcallable?

[as an aside, if the middle end knew about the sret pointer return requirement, it wouldn't have marked Frob as tailcallable, right?]

Question: should the ctor not be marked tail call, or should the documentation be adjusted to at least mention this behaviour?

Anyway, the backend code-generator checks additional constraints before performing the tailcall.

a) Should the x86 backend track where it assigned the incomming sret pointer and see if that's being passed to the tail call? (I've not figured out how to do that yet).

b) or should the middle end annotate that tail call as passing the incoming sret? (metadata? new marker? something else?) This would seem to avoid having to implement #a for each backend that has this requirement.

Question: any insights as to whether #a or #b is the better direction?

nathan

-- Nathan Sidwell



More information about the llvm-dev mailing list