(original) (raw)

Hi Jason,

The different behavior between Linux and Windows comes form the difference of the calling conversion. Windows uses 4 registers for arguments passing which Linux uses 6.

https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing

Thanks

Pengfei

From: llvm-dev On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:21 PM
To: Craig Topper
Cc: llvm-dev@lists.llvm.org
Subject: Re: \[llvm-dev\] Is it legal to pass a half by value on x86\_64?

Hi All,

Thank you very much for all the great information. This is awesome!

To circle back on Craig's questions.

I did notice LLVM 11 behave very differently.

\*\* Per: What does "incorrect math operations" mean?

The half is passed to the function as a float. The function does operations with other half numbers. On Windows when we don't get the float to half conversation the input is always truncated to 0.0.

\*\* Per: "Do you have a more complete IR file for Windows that I can take a look at?"

I can get you our IR if you want, but I think it is more convoluted than required. I was working on a unit test and I think all one needs to see the anomaly is:

define void @foo(i8, i8, i8, i8, half) {

; CHECK-I686: callq \_\_gnu\_f2h\_ieee

%6 = alloca half

store half %4, half\* %6, align 1

ret void

}

x86\_64-pc-windows gives:
push rax

.seh\_stackalloc 8

.seh\_endprologue

movss xmm0, dword ptr \[rsp + 48\] # xmm0 = mem\[0\],zero,zero,zero

movss dword ptr \[rsp + 4\], xmm0 # 4-byte Spill

pop rax

ret

.seh\_handlerdata

.text

.seh\_endproc

What I find extremely interesting is the behavior seems has something to do with the stack? For dropping the inputs by one then even Windows will generate the conversion.

define void @foo(i8, i8, i8, half) {

; CHECK-I686: callq \_\_gnu\_f2h\_ieee

%5 = alloca half

store half %3, half\* %5, align 1

ret void

}

x86\_64-pc-windows gives:

sub rsp, 40

.seh\_stackalloc 40

.seh\_endprologue

movabs rax, offset \_\_gnu\_f2h\_ieee

movaps xmm0, xmm3

call rax

mov word ptr \[rsp + 38\], ax

add rsp, 40

ret

.seh\_handlerdata

.text

.seh\_endproc

\*\* If interested, here is a dissection of our real asm.
For both Windows and Linux our IR calls c2\_foo() with a half(2):

...

call void @c2\_foo(i8\* %S\_6, \[21 x i8\*\]\* %ptr\_gvar\_instance\_7, %emlrtStack\* %c2\_b\_st\_, \[18 x float\]\* @15, half 0xH4000, \[18 x i8\]\* %t10)

They both register this in c2\_foo as:

...

%c2\_in2\_ = alloca half

store half %c2\_in2, half\* %c2\_in2\_, align 1

When we compile them, they both send 0x40000000 to c2\_foo (a single).

The Linux c2\_foo() asm addresses this with a float2half conversion:

...

mov qword ptr \[rsp + 448\], rdi

mov qword ptr \[rsp + 440\], rsi

mov qword ptr \[rsp + 432\], rdx

mov qword ptr \[rsp + 424\], rcx

movabs rcx, offset \_\_gnu\_f2h\_ieee # <---Convert Here

mov qword ptr \[rsp + 336\], r8 # 8-byte Spill

call rcx

mov word ptr \[rsp + 422\], ax

mov rcx, qword ptr \[rsp + 336\] # 8-byte Reload

mov qword ptr \[rsp + 408\], rcx

mov qword ptr \[rsp + 392\], 0

mov qword ptr \[rsp + 384\], 0

mov qword ptr \[rsp + 376\], 0

mov qword ptr \[rsp + 368\], 0

mov rdx, qword ptr \[rsp + 432\]

mov qword ptr \[rsp + 360\], rdx

mov rdx, qword ptr \[rsp + 432\]

mov rdx, qword ptr \[rdx + 8\]

mov qword ptr \[rsp + 352\], rdx

mov rdx, qword ptr \[rsp + 440\]

mov rdx, qword ptr \[rdx + 56\]

mov qword ptr \[rsp + 344\], rdx

mov dword ptr \[rsp + 400\], 0

jmp .LBB9\_9

The Windows c2\_foo() asm is missing this conversion but treats the value as if it has been converted.

...

mov rax, qword ptr \[rsp + 424\]

movss xmm0, dword ptr \[rsp + 416\] # xmm0 = mem\[0\],zero,zero,zero # <-- moves the data like it wants to convert but never does

mov qword ptr \[rsp + 344\], rcx

mov qword ptr \[rsp + 336\], rdx

mov qword ptr \[rsp + 328\], r8

mov qword ptr \[rsp + 320\], r9

mov qword ptr \[rsp + 304\], 0

mov qword ptr \[rsp + 296\], 0

mov qword ptr \[rsp + 288\], 0

mov qword ptr \[rsp + 280\], 0

mov rcx, qword ptr \[rsp + 328\]

mov qword ptr \[rsp + 272\], rcx

mov rcx, qword ptr \[rsp + 328\]

mov rcx, qword ptr \[rcx + 8\]

mov qword ptr \[rsp + 264\], rcx

mov rcx, qword ptr \[rsp + 336\]

mov rcx, qword ptr \[rcx + 56\]

mov qword ptr \[rsp + 256\], rcx

mov dword ptr \[rsp + 312\], 0

mov qword ptr \[rsp + 248\], rax # 8-byte Spill

movss dword ptr

From: Wang, Pengfei <pengfei.wang@intel.com>
Sent: Friday, March 5, 2021 7:30 AM
To: Sjoerd Meijer <Sjoerd.Meijer@arm.com>; Jason Hafer <jhafer@mathworks.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>
Subject: RE: Is it legal to pass a half by value on x86\_64?

I guess it’s designed for language portability. You can use this type across different platforms. Nevertheless, I’m not a FE expert, so I cannot think out other intentions.

The \_Float16 is a primitive type in the latest x86 ABI, but there’s no X86 target that supports it yet. So you cannot use it on X86 by now. I think that’s the difference from \_\_fp16 and why should use it.

We also have some discussion here. https://reviews.llvm.org/D97318

Thanks

Pengfei

From: Sjoerd Meijer <Sjoerd.Meijer@arm.com>
Sent: Friday, March 5, 2021 5:49 PM
To: Jason Hafer <jhafer@mathworks.com>; Wang, Pengfei <pengfei.wang@intel.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>
Subject: Re: Is it legal to pass a half by value on x86\_64?

\_\_fp16 is a pure storage format. You cannot pass it by value, because only ABI permissive types can be passed by value while \_\_fp16 is not one of them.

Yep. Any specific reason to use a pure storage format? The native type is \_Float16 and would give some benefits, but this is not yet supported on x86, see also:

https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point

Cheers,
Sjoerd.

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> on behalf of Wang, Pengfei via llvm-dev <llvm-dev@lists.llvm.org>
Sent: 05 March 2021 06:28
To: Jason Hafer <jhafer@mathworks.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>
Subject: Re: \[llvm-dev\] Is it legal to pass a half by value on x86\_64?

Hi Jason,

\_\_fp16 is a pure storage format. You cannot pass it by value, because only ABI permissive types can be passed by value while \_\_fp16 is not one of them.

if "define void @foo(i8, i8, i8, i8, half) " is even legal to use

half as a target independent type is legal for LLVM. It’s not legal for unsupported target like X86\. The behavior depends on how we lowering it. But I don’t know why there’s differences between Linux and Windows. Maybe because “\_\_gnu\_f2h\_ieee” is a Linux only function?

Thanks

Pengfei

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:46 AM
To: llvm-dev@lists.llvm.org
Cc: Jason Hafer <jhafer@mathworks.com>
Subject: \[llvm-dev\] Is it legal to pass a half by value on x86\_64?

Hello,

I am attempting to understand an anomaly I am seeing when dealing with half on Windows and could use some help.

Using LLVM 8 or 10, if I have IR of the flavor below:
define void @foo(i8, i8, i8, i8, half) {

%6 = alloca half

store half %4, half\* %6, align 1

...

ret void

}

Using x86\_64-pc-linux, we convert the float passed in with \_\_gnu\_f2h\_ieee.

Using x86\_64-pc-windows I do not get the conversion, so we end up with incorrect math operations.

While investigating I noticed clang gave me the error below:

error: parameters cannot have \_\_fp16 type; did you forget \* ?
void foo(int dc1, int dc2,int dc3,int dc4, \_\_fp16 in)

So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) " is even legal to use or if I should rather pass by ref? I have yet to find documentation to convince me one way or the other. Thus, I was hoping someone here might be able to shed some light on the issue.

Thank you in advance!

Cheers,