$ clang-tot -cc1 -S -triple i386-pc-win32 stack.c
...

_bar:   
        subl    $16, %esp
        movl    $1, (%esp)
        movl    $2, 4(%esp)
        calll   _foo
        movl    $3, (%esp)
        movl    $4, 4(%esp)
        movl    %eax, 12(%esp)
        calll   _foo
        movl    %eax, 8(%esp)
        addl    $16, %esp
        retl

$ clang-tot --version
clang version 8.0.0 (trunk 342200) (llvm/trunk 342202)
Target: x86_64-unknown-linux-gnu
Thread model: posix




On Fri, Sep 14, 2018 at 11:57 AM palpar <palparni@gmail.com> wrote:
Sorry I missed that important detail. The relevant part of the command line is:
-cc1 -S -triple i386-pc-win32
I don't expect it matters if it's for Windows or Linux in this case.

On Fri, Sep 14, 2018 at 9:16 PM David Blaikie <dblaikie@gmail.com> wrote:
Can't say I've observed that behavior (though I'm just building from top-of-tree rather than 6.0, compiling for x86-64 on linux), perhaps you could provide more detail (what target are you compiling for - possibly provide the -cc1 command line, etc).

bar:                                    # @bar
        .cfi_startproc
# %bb.0:                                # %entry
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        subq    $16, %rsp
        movl    $1, %edi
        movl    $2, %esi
        callq   foo
        movl    $3, %edi
        movl    $4, %esi
        movl    %eax, -4(%rbp)          # 4-byte Spill
        callq   foo
        movl    %eax, -8(%rbp)          # 4-byte Spill
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq


Or on 32-bit X86:

bar:                                    # @bar
        .cfi_startproc
# %bb.0:                                # %entry
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        subq    $16, %rsp
        movl    $1, %edi
        movl    $2, %esi
        callq   foo
        movl    $3, %edi
        movl    $4, %esi
        movl    %eax, -4(%rbp)          # 4-byte Spill
        callq   foo
        movl    %eax, -8(%rbp)          # 4-byte Spill
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq


On Fri, Sep 14, 2018 at 8:16 AM palpar via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Hi everyone,

I found that LLVM generates redundant code when calling functions with constant parameters, with optimizations disabled.

Consider the following C code snippet:

int foo(int x, int y);

void bar()
{
 foo(1, 2);
       foo(3, 4);
}

Clang/LLVM 6.0 generates the following assembly code:
_bar:
  subl    $32, %esp
        movl    $1, %eax
 movl    $2, %ecx
 movl    $1, (%esp)
       movl    $2, 4(%esp)
      movl    %eax, 28(%esp)
   movl    %ecx, 24(%esp)
   calll   _foo
     movl    $3, %ecx
 movl    $4, %edx
 movl    $3, (%esp)
       movl    $4, 4(%esp)
      movl    %eax, 20(%esp)
   movl    %ecx, 16(%esp)
   movl    %edx, 12(%esp)
   calll   _foo
     movl    %eax, 8(%esp)
    addl    $32, %esp
        retl
     
Note how the constants are stored in registers but when saving the parameters on the stack for the call the immediate values are used. The registers are still stored on the stack probably because it's the caller's responsibility once they were used (which seems expected).
I think the problem comes from the fact that LLVM unconditionally allocates a register for each parameter value regardless if it's used later or not.
If the stack space of the program is sufficiently large this is probably not a problem, but otherwise if there is a large number of such calls, despite not recursive, it can lead to stack overflow. Do you think I should create a bug report for this?

(Similarly, the return value of the function could be not saved but the LLVM IR code that Clang generates has the call with assignment so at this point LLVM couldn't possibly know.
define void @bar() #0 {
  %call = call i32 @foo(i32 1, i32 2)
  %call1 = call i32 @foo(i32 3, i32 4)
  ret void
}
)

Thanks,
Alpar
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
">

(original) (raw)

I suspect it was fixed by my local value sinking change, which delete unused local value materializations like these.

On Fri, Sep 14, 2018 at 12:20 PM palpar via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Thanks for checking, I suppose it may have been fixed then, I don't have the latest version to try it now.
Curious what could have fixed it, because X86FastISel::fastLowerCall() still has the calls to getRegForValue() (or maybe that's not the problem).

On Fri, Sep 14, 2018 at 10:02 PM David Blaikie <dblaikie@gmail.com> wrote:
Still not seeing it on ToT, so maybe it's been fixed?

$ clang-tot -cc1 -S -triple i386-pc-win32 stack.c
...

\_bar:
subl 16,16, %esp</div><div> movl 16,1, (%esp)
movl 2,4(2, 4(%esp)</div><div> calll \_foo</div><div> movl 2,4(3, (%esp)
movl 4,4(4, 4(%esp)</div><div> movl %eax, 12(%esp)</div><div> calll \_foo</div><div> movl %eax, 8(%esp)</div><div> addl 4,4(16, %esp
retl

$ clang-tot --version
clang version 8.0.0 (trunk 342200) (llvm/trunk 342202)
Target: x86\_64-unknown-linux-gnu
Thread model: posix




On Fri, Sep 14, 2018 at 11:57 AM palpar <palparni@gmail.com> wrote:
Sorry I missed that important detail. The relevant part of the command line is:
-cc1 -S -triple i386-pc-win32
I don't expect it matters if it's for Windows or Linux in this case.

On Fri, Sep 14, 2018 at 9:16 PM David Blaikie <dblaikie@gmail.com> wrote:
Can't say I've observed that behavior (though I'm just building from top-of-tree rather than 6.0, compiling for x86-64 on linux), perhaps you could provide more detail (what target are you compiling for - possibly provide the -cc1 command line, etc).

bar: # @bar
.cfi\_startproc
# %bb.0: # %entry
pushq %rbp
.cfi\_def\_cfa\_offset 16
.cfi\_offset %rbp, -16
movq %rsp, %rbp
.cfi\_def\_cfa\_register %rbp
subq 16,16, %rsp</div><div> movl 16,1, %edi
movl 2,2, %esi</div><div> callq foo</div><div> movl 2,3, %edi
movl 4,4, %esi</div><div> movl %eax, -4(%rbp) # 4-byte Spill</div><div> callq foo</div><div> movl %eax, -8(%rbp) # 4-byte Spill</div><div> addq 4,16, %rsp
popq %rbp
.cfi\_def\_cfa %rsp, 8
retq


Or on 32-bit X86:

bar: # @bar
.cfi\_startproc
# %bb.0: # %entry
pushq %rbp
.cfi\_def\_cfa\_offset 16
.cfi\_offset %rbp, -16
movq %rsp, %rbp
.cfi\_def\_cfa\_register %rbp
subq 16,16, %rsp</div><div> movl 16,1, %edi
movl 2,2, %esi</div><div> callq foo</div><div> movl 2,3, %edi
movl 4,4, %esi</div><div> movl %eax, -4(%rbp) # 4-byte Spill</div><div> callq foo</div><div> movl %eax, -8(%rbp) # 4-byte Spill</div><div> addq 4,16, %rsp
popq %rbp
.cfi\_def\_cfa %rsp, 8
retq


On Fri, Sep 14, 2018 at 8:16 AM palpar via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Hi everyone,

I found that LLVM generates redundant code when calling functions with constant parameters, with optimizations disabled.

Consider the following C code snippet:

int foo(int x, int y);

void bar()
{
foo(1, 2);
foo(3, 4);
}

Clang/LLVM 6.0 generates the following assembly code:
\_bar:
subl $32, %esp
movl $1, %eax
movl $2, %ecx
movl $1, (%esp)
movl $2, 4(%esp)
movl %eax, 28(%esp)
movl %ecx, 24(%esp)
calll \_foo
movl $3, %ecx
movl $4, %edx
movl $3, (%esp)
movl $4, 4(%esp)
movl %eax, 20(%esp)
movl %ecx, 16(%esp)
movl %edx, 12(%esp)
calll \_foo
movl %eax, 8(%esp)
addl $32, %esp
retl

Note how the constants are stored in registers but when saving the parameters on the stack for the call the immediate values are used. The registers are still stored on the stack probably because it's the caller's responsibility once they were used (which seems expected).
I think the problem comes from the fact that LLVM unconditionally allocates a register for each parameter value regardless if it's used later or not.
If the stack space of the program is sufficiently large this is probably not a problem, but otherwise if there is a large number of such calls, despite not recursive, it can lead to stack overflow. Do you think I should create a bug report for this?

(Similarly, the return value of the function could be not saved but the LLVM IR code that Clang generates has the call with assignment so at this point LLVM couldn't possibly know.
define void @bar() #0 {
%call = call i32 @foo(i32 1, i32 2)
%call1 = call i32 @foo(i32 3, i32 4)
ret void
}
)

Thanks,
Alpar
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev