Rich Disassembler for LLDB (original) (raw)

Description

Use the variable location information from the debug info to annotate LLDB’s disassembler (and register read) output with the location and lifetime of source variables. The rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API so more tooling could be built on top of this. In a terminal, LLDB should render the annotations as text.

Expected outcomes

For example, we could augment the disassembly for the following function

frame #0: 0x0000000100000f80 a.out`main(argc=1, argv=0x00007ff7bfeff1d8) at demo.c:4:10 [opt]
  1   void puts(const char*);
  2   int main(int argc, char **argv) {
  3    for (int i = 0; i < argc; ++i)
→ 4      puts(argv[i]);
  5    return 0;
  6   }
(lldb) disassemble
a.out`main:
...
  0x100000f71 <+17>: movl  %edi, %r14d
  0x100000f74 <+20>: xorl  %r15d, %r15d
  0x100000f77 <+23>: nopw  (%rax,%rax)
→  0x100000f80 <+32>: movq  (%rbx,%r15,8), %rdi
  0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts
  0x100000f89 <+41>: incq  %r15
  0x100000f8c <+44>: cmpq  %r15, %r14
  0x100000f8f <+47>: jne 0x100000f80 ; <+32> at demo.c:4:10
  0x100000f91 <+49>: addq  $0x8, %rsp
  0x100000f95 <+53>: popq  %rbx
...

using the debug information that LLDB also has access to (observe how the source variable i is in r15 from [0x100000f77+slide))

$ dwarfdump demo.dSYM --name  i 
demo.dSYM/Contents/Resources/DWARF/demo: file format Mach-O 64-bit x86-64
0x00000076: DW_TAG_variable
 DW_AT_location (0x00000098: 
 [0x0000000100000f60, 0x0000000100000f77): DW_OP_consts +0, DW_OP_stack_value
 [0x0000000100000f77, 0x0000000100000f91): DW_OP_reg15 R15)
 DW_AT_name ("i")
 DW_AT_decl_file ("/tmp/t.c")
 DW_AT_decl_line (3)
 DW_AT_type (0x000000b2 "int")

to produce output like this, where we annotate when a variable is live and what its location is:

(lldb) disassemble
a.out`main:
...                                                               ; i=0
  0x100000f74 <+20>: xorl  %r15d, %r15d                           ; i=r15
  0x100000f77 <+23>: nopw  (%rax,%rax)                            ; |
→  0x100000f80 <+32>: movq  (%rbx,%r15,8), %rdi                   ; |
  0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts    ; |
  0x100000f89 <+41>: incq  %r15                                   ; |
  0x100000f8c <+44>: cmpq  %r15, %r14                             ; |
  0x100000f8f <+47>: jne 0x100000f80 ; <+32> at t.c:4:10          ; |
  0x100000f91 <+49>: addq  $0x8, %rsp                             ; i=undef
  0x100000f95 <+53>: popq  %rbx

The goal would be to produce output like this for a subset of unambiguous cases, for example, variables that are constant or fully in registers.

Confirmed mentors and their contacts

Required / desired skills

Required:

Desired:

Size of the project.

medium (~175h)

An easy, medium or hard rating if possible

hard