LLDB disassembly of RISC-V extension instructions (original) (raw)

With the LLVM 20.1.1 release assembly and disassembly-by-objdump of a standard cryptography extension instruction (aes32dsi) works as expected:

int main() {
asm("aes32dsi x1, x2, x3, 2");
return 0;
}

clang --target=riscv32 -march=rv32i_zknd -c main.c -g -o main_aes.o
llvm--objdump -d main_aes.o
...
      18: aa3100b3      aes32dsi        ra, sp, gp, 0x2

However LLDB doesn’t disassemble this instruction:

(lldb) file main_aes.o
(lldb) di -s 0x18
error: Failed to disassemble memory at 0x00000018.

The recent addition of cpu-features ([lldb] Support overriding the disassembly CPU & features by JDevlieghere · Pull Request #115382 · llvm/llvm-project · GitHub) looks promising (and does appear to parse the feature as an extension specifier) but still doesn’t disassemble correctly:

(lldb) di -s 0x18 -c 40 --features zknd
error: Failed to disassemble memory at 0x00000018.
(lldb) di -s 0x18 -c 40 --features zbadextension
'zbadextension' is not a recognized feature for this target (ignoring feature)

Is there some other setting that I’m missing?

Just in case, can you try memory read 0x18?

“Failed to disassemble” could be not great error reporting for “could not read memory”.

Also double check that that 0x18 from the object file does in fact end up at 0x18 in the loaded program. dis main would be a way to do the disassembly without needing the literal address.

tomg March 25, 2025, 2:44pm 3

Apologies; I overly minimised the details when keeping the snippets brief. The rest of the disassembly is complete and matches the objdump output until it quietly terminates at the undisassembled instruction:

(lldb) di -s 0 -c 40 --features zknd
main_aes.o`.L0 :
main_aes.o[0x0] <+0>:   addi   sp, sp, -0x10
main_aes.o[0x4] <+4>:   sw     ra, 0xc(sp)

main_aes.o`clang version 20.1.1:
main_aes.o[0x8] <+0>:   sw     s0, 0x8(sp)
main_aes.o[0xc] <+4>:   addi   s0, sp, 0x10
main_aes.o[0x10] <+8>:  li     a0, 0x0
main_aes.o[0x14] <+12>: sw     a0, -0xc(s0)

----
$ llvm-objdump -d main_aes.o

main_aes.o:     file format elf32-littleriscv

Disassembly of section .text:

00000000 <main>:
       0: ff010113      addi    sp, sp, -0x10
       4: 00112623      sw      ra, 0xc(sp)
       8: 00812423      sw      s0, 0x8(sp)
       c: 01010413      addi    s0, sp, 0x10
      10: 00000513      li      a0, 0x0
      14: fea42a23      sw      a0, -0xc(s0)
      18: aa3100b3      aes32dsi        ra, sp, gp, 0x2
      1c: 00c12083      lw      ra, 0xc(sp)
      20: 00812403      lw      s0, 0x8(sp)
      24: 01010113      addi    sp, sp, 0x10
      28: 00008067      ret
thomasg@se-henri:.../thomasg/freespace/llvm-project-llvmorg-20.1.1  $                          

@tomg the features flag will give you what you want, but it uses the format that the LLVM disassembler takes for the features string. Try --features +zbadextension . This replaces the automatic extensions that lldb turns on, including compressed, so if you want them try +c,+zbadextension .

My team has recently added built attribute support for lldb disassembly for RISC-V and Hexagon. We’ll be upstreaming it soon, and people can use that framework to add support for their extensions.

@DavidSpickett we noticed the same thing when adding support for the Xqci extension - the RISC-V disassembler spits out an error instead of a warning when it sees instructions that it can’t disassemble. This gets a bit comical when you are disassembling a range with bytes turned on. You see bytes for the good instructions, and errors for the bad instructions.

tomg March 25, 2025, 2:51pm 5

Prefixing the extension name with ‘+’ does indeed work - thanks very much!

(lldb) di -s 0 -c 40 --features +zknd
main_aes.o`.L0 :
main_aes.o[0x0] <+0>:   addi   sp, sp, -0x10
main_aes.o[0x4] <+4>:   sw     ra, 0xc(sp)

main_aes.o`clang version 20.1.1:
main_aes.o[0x8] <+0>:   sw     s0, 0x8(sp)
main_aes.o[0xc] <+4>:   addi   s0, sp, 0x10
main_aes.o[0x10] <+8>:  li     a0, 0x0
main_aes.o[0x14] <+12>: sw     a0, -0xc(s0)
main_aes.o[0x18] <+16>: aes32dsi ra, sp, gp, 0x2
main_aes.o[0x1c] <+20>: lw     ra, 0xc(sp)
main_aes.o[0x20] <+24>: lw     s0, 0x8(sp)
main_aes.o[0x24] <+28>: addi   sp, sp, 0x10
main_aes.o[0x28] <+32>: ret

You’re welcome!

Something interesting - our latest 20.0 lldb doesn’t stop when it can’t disassemble an instruction - instead it prints nothing

With compressed instructions on:
(lldb) dis -n main
factrv32`main:
factrv32[0x104ce] <+0>: addi sp, sp, -0x20
factrv32[0x104d0] <+2>: sw ra, 0x1c(sp)
factrv32[0x104d2] <+4>: sw s0, 0x18(sp)
factrv32[0x104d4] <+6>: addi s0, sp, 0x20
factrv32[0x104d6] <+8>: li a2, 0x0
factrv32[0x104d8] <+10>: sw a2, -0x1c(s0)
factrv32[0x104dc] <+14>: sw a2, -0xc(s0)

With compressed instructions off:
(lldb) dis -n main -Y -c
factrv32`main:
factrv32[0x104ce] <+0>:
factrv32[0x104d0] <+2>:
factrv32[0x104d2] <+4>:
factrv32[0x104d4] <+6>:
factrv32[0x104d6] <+8>:
factrv32[0x104d8] <+10>: sw a2, -0x1c(s0)
factrv32[0x104dc] <+14>: sw a2, -0xc(s0)

With compressed instructions off and bytes shown:
(lldb) dis -n main -Y -c -b
factrv32`main:
factrv32[0x104ce] <+0>: < invalid>
factrv32[0x104d0] <+2>: < invalid>
factrv32[0x104d2] <+4>: < invalid>
factrv32[0x104d4] <+6>: < invalid>
factrv32[0x104d6] <+8>: < invalid>
factrv32[0x104d8] <+10>: 23 22 c4 fe sw a2, -0x1c(s0)
factrv32[0x104dc] <+14>: 23 2a c4 fe sw a2, -0xc(s0)

We’ll look into that and get it fixed.

(space added before “invalid” to defeat discourse formatting)

tomg March 25, 2025, 3:19pm 7

With the 20.1.1 release the behavior with the (uncompressed) aes32dsi instruction was quietly halting when disassembly fails. This wasn’t as helpful as giving an explicit error, since it left open the possibility that it was a limit to the context being disassembled, bad symbol range, required options like --count, --force, etc.

Is the behavior you just showed where it continues through the unrecognized instructions implemented after 20.1.1 or is it specific to compressed instruction handling?

@JDevlieghere would be cool if we added this gotcha about clang vs llvm-mc options to the help text:

(lldb) settings list target.disassembly-features
  target.disassembly-features -- Specify additional CPU features for disassembling.

And use RISC-V as an example to re-enforce the meaning.

That’s the default behavior for our downstream 20.0. It’s not correct, but it’s better than what you’re seeing. We need to make sure upstream moves to at least what we’ve got downstream, and better yet would be to make it print something like “unknown opcode” instead of throwing an error.