[RFC] Dynamic sizes and field offsets in DWARF (original) (raw)

I’ve been working on improving the DWARF output of an LLVM-based Ada compiler. I’ve landed a few relatively straightforward patches so far, but the next set of problems requires bigger changes, so I thought I would post here.

The main issue is that Ada is more dynamic than C or C++. My focus in this post is that, in Ada, the size of a given type can vary, as can the offsets of fields.

This means that these fields in DIType:

uint64_t SizeInBits;
uint64_t OffsetInBits;

… will need to be expressions in the general case. I still haven’t worked out all of the details of how to make this change.

One extra difficulty is that a packed record can have a field that is at a dynamic bit offset. I posted about this recently on dwarf-discuss:

https://lists.dwarfstd.org/pipermail/dwarf-discuss/2025-April/002666.html

The gist is that there is no way to represent this in DWARF 5.

The one response on the dwarf-discuss thread suggested using a DWARF extension: have the compiler allow an expression for DW_AT_data_bit_offset. This would be fine by me, but I wonder whether LLVM would accept it.

Now, GNAT doesn’t seem to need the full generality here. In particular, from what I can tell, the byte offset of a field may be non-constant, but the bit offset within that byte will always be constant.

Taking advantage of this observation, GCC emits a DWARF 3 construct to work around this problem. That is, it emits an expression for DW_AT_data_member_location, plus a constant DW_AT_bit_offset. (This approach was obsoleted in DWARF 4 and removed entirely from DWARF 5.)

So one idea would be to replicate this behavior for LLVM. It’s somewhat ugly, though, because in addition to the deprecation, two attributes would be needed where, ideally, just one would do.

It may be worth noting that, in Ada, not every type can have a variable size – I think just arrays and records. So, maybe there’s some possible refactoring where a constant size is available in many cases.

I’ve gone back and forth a few times, but my current plan is to go with the “DWARF 3” approach: for the offset, add an optional “dynamic offset in bytes” to DIType and, when this is set, generate the deprecated DWARF.

For the size, I suppose I will try changing the SizeInBits to allow an expression. One wrinkle here is that an expression here would necessarily compute the size in bytes. So, the API may be a little inconsistent.

I’d appreciate any thoughts you may have.

We support such a thing (run-time positioned/sized fields) in our Pascal compiler on Itanium (not LLVM based) using DWARF 3. We generate extra routines to compute/return the sizes and then use call_code in the data member location for the member and in the size field for the member’s type. This is one of the reasons we are sticking with DWARF 4 for OpenVMS.

For example,

type
  dynrec(i:integer) = record
                      ! Field with run-time size
                      f1 : packed array [1..i] of boolean;

                      ! Field with run-time size AND run-time position
                      f2 : packed array [1..i] of boolean;

                      ! Just an integer with run-time position
                      f3 : integer;
                      end;

procedure nested(p1:integer);
  var d : dynrec(p1);
  begin
  d := zero;
  d.f1[1] := true;
  d.f2[p1]:= true;
  d.f3    := 1234;
  writeln(d.f1[1],d.f2[p1],d.f3);
  end;

generates things like

00001ab3 000000d3  0000000a [0a] (base type) (level: 3)
00001ab4 000000d4           encoding:  05 (5) (signed)
00001ab5 000000d5           byte size:  [04] 0000000000000004 (4)
00001ab6 000000d6  0000000b [0b] (array) (level: 3)
00001ab7 000000d7           type:  ref: 00000146 (00000146)
00001abb 000000db           bit stride:  [01] 0000000000000001 (1)
00001abc 000000dc  0000000c [0c] (subrange) (level: 4)
00001abd 000000dd           type:  ref: 000000d3 (000000d3)
00001ac1 000000e1           lower bound:  [01] 0000000000000001 (1)
00001ac2 000000e2           upper bound:  block length: [09] 9 (9)
00001ac3 000000e3               e7 (call_code) <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mi>O</mi><mi>D</mi><mi>E</mi></mrow><annotation encoding="application/x-tex">CODE</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">CO</span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class="mord mathnormal" style="margin-right:0.05764em;">E</span></span></span></span> + 810
00001acc 000000ec  00000000 [00] (NULL) (level: 4)

00001ae6 00000106  0000000e [0e] (member) (level: 4)
00001ae7 00000107           name:  "F1"
00001aea 0000010a           type:  ref: 000000d6 (000000d6)
00001aee 0000010e           data member location:  block length: [03] 3 (3)
00001aef 0000010f               10 (constu) [00] 0000000000000000 (0)
00001af1 00000111               22 (plus)

and all of that allows our OpenVMS debugger to print it at the writeln

DBG> examine d
T\T\NESTED\D
    F1
        [1]:            TRUE
        [2]-[10]:       FALSE
    F2
        [1]-[9]:        FALSE
        [10]:           TRUE
    F3: 1234

We use those helper routines since our proprietary backend doesn’t let me write out more complicated location expression. If it could, it should work.

For x86 and LLVM, we are actually working on this very thing now using a similar scheme. I’ll point my engineers who are doing this work to this discussion to talk about what we are doing. As far as I know, we are not added new DWARF but just adding some DI metadata for such things and then extending the DWARF generation code.

In Objective-C, for example, the offsets of instance variables in an object are not (necessarily) known at compile time, and LLDB knows that and ignores the offsets, which results in DWARF like this:

$ cat /tmp/t.m                           
@interface Foo {
int i, j;
};
@end;
Foo *foo;
$ dwarfdump /tmp/t.o --name Foo --show-children
/tmp/t.o:	file format Mach-O arm64

0x00000031: DW_TAG_structure_type
              DW_AT_name	("Foo")
              DW_AT_byte_size	(0x08)
              DW_AT_decl_file	("/tmp/t.m")
              DW_AT_decl_line	(1)
              DW_AT_APPLE_runtime_class	(DW_LANG_ObjC)

0x00000037:   DW_TAG_member
                DW_AT_name	("i")
                DW_AT_type	(0x0000004c "int")
                DW_AT_decl_file	("/tmp/t.m")
                DW_AT_decl_line	(2)
                DW_AT_data_member_location	(0x00)
                DW_AT_accessibility	(DW_ACCESS_protected)

0x00000041:   DW_TAG_member
                DW_AT_name	("j")
                DW_AT_type	(0x0000004c "int")
                DW_AT_decl_file	("/tmp/t.m")
                DW_AT_decl_line	(2)
                DW_AT_data_member_location	(0x00)
                DW_AT_accessibility	(DW_ACCESS_protected)

0x0000004b:   NULL

It would be nice to have the option to not emit a data member location or to emit an expression for it (I don’t think we could produce one for Objective-C; we need to ask the runtime, but maybe you can for Ada).

ararmine April 23, 2025, 12:37pm 4

Our LLVM-based Pascal compiler on X86 OpenVMS generates Dwarf info for variant members, which are similar to the example discussed in dwarf-discuss.

LLVM-10 already had support for variant members with a single constant discriminant value.
One can call the DIBuilder::createVariantMemberType API to emit DIDerivedType metadata with the DW_TAG_member tag, which is subsequently processed in the
back-end.
To support a list of discriminants we extended the DIDerivedType class by adding a new operand of DINodeArray type to it, we changed the signature of the createVariantMemberType to take DINodeArray type argument and we implemented the corresponding DWARF printing in the back-end.

For example:

program dbgrecvars;

type
  rec1 = packed record
        part: 1..99;
        case boolean of
        true : (
                t1 : integer;
                case t2 : integer of
                0,3,11..20: ( t20 : integer ) ;
                1: ( t21 : integer ) ;
                otherwise ( t2o : integer )
                );
        false : (
                f1 : real;
                case char of
                'A'..'Z' : (
                        cA : [bit(3)] 0..7;
                        cZ : [bit(2)] 0..3
                        );
                '0'..'9' : (
                        c0 : real;
                        )
                );
        end;

var
        vrec : rec1;

output of llvm-dwarfdump :

0x0000009d:       DW_TAG_variant_part
                    DW_AT_discr	(0x000000a2)

0x000000a2:         DW_TAG_volatile_type
                      DW_AT_type	(0x00000047 "BOOLEAN")
                      DW_AT_byte_size	(0x04)
                      DW_AT_bit_size	(0x00)
                      DW_AT_bit_offset	(0x20)
                      DW_AT_data_member_location	(0x00)

0x000000ab:         DW_TAG_variant
                      DW_AT_discr_value	(0x01)

0x000000ad:           DW_TAG_member
                        DW_AT_name	("T1")
                        DW_AT_type	(0x000001de "INTEGER")
                        DW_AT_decl_file	("variants_test.pas;5")
                        DW_AT_decl_line	(4)
                        DW_AT_byte_size	(0x04)
                        DW_AT_bit_size	(0x20)
                        DW_AT_bit_offset	(0xfffffffffffffff9)
                        DW_AT_data_member_location	(0x00)

0x000000c3:           NULL

0x000000c4:         DW_TAG_variant
                      DW_AT_discr_value	(0x01)

0x000000c6:           DW_TAG_member
                        DW_AT_name	("T2")
                        DW_AT_type	(0x000000d3 "T2")
                        DW_AT_decl_file	("variants_test.pas;5")
                        DW_AT_decl_line	(4)
                        DW_AT_data_member_location	(0x00)

0x000000d2:           NULL

0x000000d3:         DW_TAG_variant_part
                      DW_AT_discr	(0x000000e0)
                      DW_AT_type	(0x000001de "INTEGER")
                      DW_AT_name	("T2")

0x000000e0:           DW_TAG_member
                        DW_AT_name	("T2")
                        DW_AT_type	(0x000001de "INTEGER")
                        DW_AT_decl_file	("variants_test.pas;5")
                        DW_AT_decl_line	(4)
                        DW_AT_data_member_location	(0x04)

0x000000ec:           DW_TAG_variant
                        DW_AT_discr_list	(<0x07> 01 0b 14 00 03 00 00 )

0x000000f5:             DW_TAG_member
                          DW_AT_name	("T20")
                          DW_AT_type	(0x000001de "INTEGER")
                          DW_AT_decl_file	("variants_test.pas;5")
                          DW_AT_decl_line	(4)
                          DW_AT_byte_size	(0x04)
                          DW_AT_bit_size	(0x20)
                          DW_AT_bit_offset	(0xfffffffffffffff9)
                          DW_AT_data_member_location	(0x08)

0x0000010b:             NULL

0x0000010c:           DW_TAG_variant
                        DW_AT_discr_value	(0x01)

0x0000010e:             DW_TAG_member
                          DW_AT_name	("T21")
                          DW_AT_type	(0x000001de "INTEGER")
                          DW_AT_decl_file	("variants_test.pas;5")
                          DW_AT_decl_line	(4)
                          DW_AT_byte_size	(0x04)
                          DW_AT_bit_size	(0x20)
                          DW_AT_bit_offset	(0xfffffffffffffff9)
                          DW_AT_data_member_location	(0x08)

0x00000124:             NULL

0x00000125:           DW_TAG_variant

0x00000126:             DW_TAG_member
                          DW_AT_name	("T2O")
                          DW_AT_type	(0x000001de "INTEGER")
                          DW_AT_decl_file	("variants_test.pas;5")
                          DW_AT_decl_line	(4)
                          DW_AT_byte_size	(0x04)
                          DW_AT_bit_size	(0x20)
                          DW_AT_bit_offset	(0xfffffffffffffff9)
                          DW_AT_data_member_location	(0x08)

0x0000013c:             NULL

0x0000013d:           NULL

0x0000013e:         DW_TAG_variant
                      DW_AT_discr_value	(0x00)

0x00000140:           DW_TAG_member
                        DW_AT_name	("F1")
                        DW_AT_type	(0x000001e5 "S_FLOAT")
                        DW_AT_decl_file	("variants_test.pas;5")
                        DW_AT_decl_line	(4)
                        DW_AT_byte_size	(0x04)
                        DW_AT_bit_size	(0x20)
                        DW_AT_bit_offset	(0xfffffffffffffff9)
                        DW_AT_data_member_location	(0x00)

0x00000156:           NULL

0x00000157:         DW_TAG_variant
                      DW_AT_discr_value	(0x00)

0x00000159:           DW_TAG_member
                        DW_AT_type	(0x00000162 "")
                        DW_AT_decl_file	("variants_test.pas;5")
                        DW_AT_decl_line	(4)
                        DW_AT_data_member_location	(0x00)

0x00000161:           NULL

0x00000162:         DW_TAG_variant_part
                      DW_AT_discr	(0x00000167)

0x00000167:           DW_TAG_volatile_type
                        DW_AT_type	(0x000001ec "CHAR")
                        DW_AT_byte_size	(0x01)
                        DW_AT_bit_size	(0x00)
                        DW_AT_bit_offset	(0x08)
                        DW_AT_data_member_location	(0x00)

0x00000170:           DW_TAG_variant
                        DW_AT_discr_list	(<0x03> 01 41 5a )

0x00000175:             DW_TAG_member
                          DW_AT_name	("CA")
                          DW_AT_type	(0x000001f3 "subrange INT32")
                          DW_AT_decl_file	("variants_test.pas;5")
                          DW_AT_decl_line	(4)
                          DW_AT_byte_size	(0x01)
                          DW_AT_bit_size	(0x03)
                          DW_AT_bit_offset	(0xfffffffffffffffe)
                          DW_AT_data_member_location	(0x04)

0x0000018b:             NULL

0x0000018c:           DW_TAG_variant
                        DW_AT_discr_list	(<0x03> 01 41 5a )

0x00000191:             DW_TAG_member
                          DW_AT_name	("CZ")
                          DW_AT_type	(0x000001fb "subrange INT32")
                          DW_AT_decl_file	("variants_test.pas;5")
                          DW_AT_decl_line	(4)
                          DW_AT_byte_size	(0x01)
                          DW_AT_bit_size	(0x02)
                          DW_AT_bit_offset	(0x04)
                          DW_AT_data_member_location	(0x05)

0x000001a0:             NULL

0x000001a1:           DW_TAG_variant
                        DW_AT_discr_list	(<0x03> 01 30 39 )

0x000001a6:             DW_TAG_member
                          DW_AT_name	("C0")
                          DW_AT_type	(0x000001e5 "S_FLOAT")
                          DW_AT_decl_file	("variants_test.pas;5")
                          DW_AT_decl_line	(4)
                          DW_AT_byte_size	(0x04)
                          DW_AT_bit_size	(0x20)
                          DW_AT_bit_offset	(0xfffffffffffffff9)
                          DW_AT_data_member_location	(0x04)

0x000001bc:             NULL

Hope this will help.

tromey April 23, 2025, 2:59pm 5

Hi. Thanks for your response.

I don’t know what call_code means. Could you elaborate on that?

Also, are your LLVM patches available somewhere? I am actively working on this and I would rather not duplicate work if that is possible.

tromey April 23, 2025, 3:01pm 6

Hi. Variant parts are also on my to-do list. I was planning to tackle dynamic sizes and offsets first, because they seemed like a prerequisite.

Anyway – are your LLVM patches for this available somewhere?

tromey May 8, 2025, 2:00pm 7

I went ahead & sent a variant part patch here: Two DWARF variant part improvements by tromey · Pull Request #138953 · llvm/llvm-project · GitHub

I’m going to look at dynamic sizes and bit offsets next.

tromey May 8, 2025, 3:43pm 8

That patch is necessary for Ada, but not sufficient, as in Ada a variant can have multiple members.

Also, I discovered that DIDerivedType::getExtraData is overloaded for both variants and bit fields, meaning that in the current setup one cannot have a variant that is a bit field. This touches on dynamic bit sizes, since it turns out that the “storage offset” (which is different from the offset-in-bits) is stored here.

I’m not totally sure how I will untangle this but I think allowing a variant to hold multiple member types may be relatively straightforward.