[cfe-dev] compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX (original) (raw)

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Thu Dec 3 12:39:12 PST 2020


On Thu, Dec 3, 2020 at 12:29 PM Artem Belevich via cfe-dev < cfe-dev at lists.llvm.org> wrote:

On Thu, Dec 3, 2020 at 11:57 AM Alexey.Bataev <a.bataev at outlook.com> wrote: Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DWATcallsite attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt). Thank you for looking into this. Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it. Would it help if we were to fold the whole call.uni into a single line? It appears that swarf injects the label after the call instruction line gets printed. What is the label supposed to point at? At the call instruction itself? Or at the return point?

I believe it's immediately after the call.

Testing an x86 example:

$ cat test.c

void f1();

void f2();

void f3() {

f1();

f2();

}

$ clang-tot -gdwarf-5 test.c -c -O3 && llvm-dwarfdump-tot test.o | grep "DW_TAG|call" && llvm-objdump -d test.o

0x0000000c: DW_TAG_compile_unit

0x00000023: DW_TAG_subprogram

            DW_AT*_call_*all_calls    (true)

0x0000002e: *DW_TAG_call_*site

              DW_AT*_call_*origin     (0x0000003b)

              DW_AT*_call_*return_pc  (0x0000000000000008)

0x00000034: *DW_TAG_call_*site

              DW_AT*_call_*origin     (0x00000041)

              DW_AT*_call_*tail_call  (true)

              DW_AT*_call_*pc (0x000000000000000b)

0x0000003b: DW_TAG_subprogram

0x0000003f: DW_TAG_unspecified_parameters

0x00000041: DW_TAG_subprogram

0x00000045: DW_TAG_unspecified_parameters

test.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :

   0: 50                            pushq   %rax

   1: 31 c0                         xorl    %eax, %eax

   3: e8 00 00 00 00                callq   0x8 <f3+0x8>

   8: 31 c0                         xorl    %eax, %eax

   a: 59                            popq    %rcx

   b: e9 00 00 00 00                jmp     0x10 <f3+0x10>

But the tail call uses the jump location, because it can't do anything else.

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop

treating it as a call instruction in NVPTXInstrInfo.td. Can't say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

David suggested not allowing -fdebug-default-version=5 to override the DWARF version provided by NVPTX back-end. That would make sense, considering that we can't handle the newer DWARF versions anyways. Let me see if I can fix the lineinfo generation first.

Yeah, -gdwarf-5 and -fdebug-default-version=5 -g should behave the same. So whatever it is that's disabling/downgrading to DWARFv2 for NVPTX when the user uses -gdwarf-5 shoudl do the same for -fdebug-default-version=5

--Artem

------------- Best regards, Alexey Bataev

12/2/2020 6:00 PM, Artem Belevich пишет: Hi, Alexey! I've ran into an odd case with debug info generation in NVPTX. Reproduction: ------------------------ device attribute((noinline)) void bar() { printf("Hi!"); } global void foo() { bar(); } int main(){} ------------------------ $ clang++ -v --cuda-gpu-arch=sm70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c Compilation fails due to a syntax error reported by ptxas. The reason for the error is that clang generates a label in the middle of a call.uni instuction. E.g: { // callseq 1, 0 .reg .b32 tempparamreg; call.uni Ltmp14: Z3barv, ( ); } // callseq 1 The odd part is that we're only generating line info and there is no DWARF in the generated PTX. It appears that this behavior is triggered by -dwarf-version=5 passed to cc1. Looks like another case where PTX syntax breaks DWARF generator assumptions. It's possible to work around it with an additional -Xarchdevice_ _-fdebug-default-version=2, but I'd appreciate it if you could take a look and see if that could be fixed. -- --Artem Belevich

-- --Artem Belevich


cfe-dev mailing list cfe-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201203/8f9c329c/attachment-0001.html>



More information about the cfe-dev mailing list