[RFC] Dynamic sizes and field offsets in DWARF (original) (raw)
I’ve been working on improving the DWARF output of an LLVM-based Ada compiler. I’ve landed a few relatively straightforward patches so far, but the next set of problems requires bigger changes, so I thought I would post here.
The main issue is that Ada is more dynamic than C or C++. My focus in this post is that, in Ada, the size of a given type can vary, as can the offsets of fields.
This means that these fields in DIType
:
uint64_t SizeInBits;
uint64_t OffsetInBits;
… will need to be expressions in the general case. I still haven’t worked out all of the details of how to make this change.
One extra difficulty is that a packed record can have a field that is at a dynamic bit offset. I posted about this recently on dwarf-discuss:
https://lists.dwarfstd.org/pipermail/dwarf-discuss/2025-April/002666.html
The gist is that there is no way to represent this in DWARF 5.
The one response on the dwarf-discuss thread suggested using a DWARF extension: have the compiler allow an expression for DW_AT_data_bit_offset
. This would be fine by me, but I wonder whether LLVM would accept it.
Now, GNAT doesn’t seem to need the full generality here. In particular, from what I can tell, the byte offset of a field may be non-constant, but the bit offset within that byte will always be constant.
Taking advantage of this observation, GCC emits a DWARF 3 construct to work around this problem. That is, it emits an expression for DW_AT_data_member_location
, plus a constant DW_AT_bit_offset
. (This approach was obsoleted in DWARF 4 and removed entirely from DWARF 5.)
So one idea would be to replicate this behavior for LLVM. It’s somewhat ugly, though, because in addition to the deprecation, two attributes would be needed where, ideally, just one would do.
It may be worth noting that, in Ada, not every type can have a variable size – I think just arrays and records. So, maybe there’s some possible refactoring where a constant size is available in many cases.
I’ve gone back and forth a few times, but my current plan is to go with the “DWARF 3” approach: for the offset, add an optional “dynamic offset in bytes” to DIType
and, when this is set, generate the deprecated DWARF.
For the size, I suppose I will try changing the SizeInBits
to allow an expression. One wrinkle here is that an expression here would necessarily compute the size in bytes. So, the API may be a little inconsistent.
I’d appreciate any thoughts you may have.
We support such a thing (run-time positioned/sized fields) in our Pascal compiler on Itanium (not LLVM based) using DWARF 3. We generate extra routines to compute/return the sizes and then use call_code in the data member location for the member and in the size field for the member’s type. This is one of the reasons we are sticking with DWARF 4 for OpenVMS.
For example,
type
dynrec(i:integer) = record
! Field with run-time size
f1 : packed array [1..i] of boolean;
! Field with run-time size AND run-time position
f2 : packed array [1..i] of boolean;
! Just an integer with run-time position
f3 : integer;
end;
procedure nested(p1:integer);
var d : dynrec(p1);
begin
d := zero;
d.f1[1] := true;
d.f2[p1]:= true;
d.f3 := 1234;
writeln(d.f1[1],d.f2[p1],d.f3);
end;
generates things like
00001ab3 000000d3 0000000a [0a] (base type) (level: 3)
00001ab4 000000d4 encoding: 05 (5) (signed)
00001ab5 000000d5 byte size: [04] 0000000000000004 (4)
00001ab6 000000d6 0000000b [0b] (array) (level: 3)
00001ab7 000000d7 type: ref: 00000146 (00000146)
00001abb 000000db bit stride: [01] 0000000000000001 (1)
00001abc 000000dc 0000000c [0c] (subrange) (level: 4)
00001abd 000000dd type: ref: 000000d3 (000000d3)
00001ac1 000000e1 lower bound: [01] 0000000000000001 (1)
00001ac2 000000e2 upper bound: block length: [09] 9 (9)
00001ac3 000000e3 e7 (call_code) <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mi>O</mi><mi>D</mi><mi>E</mi></mrow><annotation encoding="application/x-tex">CODE</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">CO</span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class="mord mathnormal" style="margin-right:0.05764em;">E</span></span></span></span> + 810
00001acc 000000ec 00000000 [00] (NULL) (level: 4)
00001ae6 00000106 0000000e [0e] (member) (level: 4)
00001ae7 00000107 name: "F1"
00001aea 0000010a type: ref: 000000d6 (000000d6)
00001aee 0000010e data member location: block length: [03] 3 (3)
00001aef 0000010f 10 (constu) [00] 0000000000000000 (0)
00001af1 00000111 22 (plus)
and all of that allows our OpenVMS debugger to print it at the writeln
DBG> examine d
T\T\NESTED\D
F1
[1]: TRUE
[2]-[10]: FALSE
F2
[1]-[9]: FALSE
[10]: TRUE
F3: 1234
We use those helper routines since our proprietary backend doesn’t let me write out more complicated location expression. If it could, it should work.
For x86 and LLVM, we are actually working on this very thing now using a similar scheme. I’ll point my engineers who are doing this work to this discussion to talk about what we are doing. As far as I know, we are not added new DWARF but just adding some DI metadata for such things and then extending the DWARF generation code.
In Objective-C, for example, the offsets of instance variables in an object are not (necessarily) known at compile time, and LLDB knows that and ignores the offsets, which results in DWARF like this:
$ cat /tmp/t.m
@interface Foo {
int i, j;
};
@end;
Foo *foo;
$ dwarfdump /tmp/t.o --name Foo --show-children
/tmp/t.o: file format Mach-O arm64
0x00000031: DW_TAG_structure_type
DW_AT_name ("Foo")
DW_AT_byte_size (0x08)
DW_AT_decl_file ("/tmp/t.m")
DW_AT_decl_line (1)
DW_AT_APPLE_runtime_class (DW_LANG_ObjC)
0x00000037: DW_TAG_member
DW_AT_name ("i")
DW_AT_type (0x0000004c "int")
DW_AT_decl_file ("/tmp/t.m")
DW_AT_decl_line (2)
DW_AT_data_member_location (0x00)
DW_AT_accessibility (DW_ACCESS_protected)
0x00000041: DW_TAG_member
DW_AT_name ("j")
DW_AT_type (0x0000004c "int")
DW_AT_decl_file ("/tmp/t.m")
DW_AT_decl_line (2)
DW_AT_data_member_location (0x00)
DW_AT_accessibility (DW_ACCESS_protected)
0x0000004b: NULL
It would be nice to have the option to not emit a data member location or to emit an expression for it (I don’t think we could produce one for Objective-C; we need to ask the runtime, but maybe you can for Ada).
ararmine April 23, 2025, 12:37pm 4
Our LLVM-based Pascal compiler on X86 OpenVMS generates Dwarf info for variant members, which are similar to the example discussed in dwarf-discuss.
LLVM-10 already had support for variant members with a single constant discriminant value.
One can call the DIBuilder::createVariantMemberType API to emit DIDerivedType metadata with the DW_TAG_member tag, which is subsequently processed in the
back-end.
To support a list of discriminants we extended the DIDerivedType class by adding a new operand of DINodeArray type to it, we changed the signature of the createVariantMemberType to take DINodeArray type argument and we implemented the corresponding DWARF printing in the back-end.
For example:
program dbgrecvars;
type
rec1 = packed record
part: 1..99;
case boolean of
true : (
t1 : integer;
case t2 : integer of
0,3,11..20: ( t20 : integer ) ;
1: ( t21 : integer ) ;
otherwise ( t2o : integer )
);
false : (
f1 : real;
case char of
'A'..'Z' : (
cA : [bit(3)] 0..7;
cZ : [bit(2)] 0..3
);
'0'..'9' : (
c0 : real;
)
);
end;
var
vrec : rec1;
output of llvm-dwarfdump :
0x0000009d: DW_TAG_variant_part
DW_AT_discr (0x000000a2)
0x000000a2: DW_TAG_volatile_type
DW_AT_type (0x00000047 "BOOLEAN")
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x00)
DW_AT_bit_offset (0x20)
DW_AT_data_member_location (0x00)
0x000000ab: DW_TAG_variant
DW_AT_discr_value (0x01)
0x000000ad: DW_TAG_member
DW_AT_name ("T1")
DW_AT_type (0x000001de "INTEGER")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x20)
DW_AT_bit_offset (0xfffffffffffffff9)
DW_AT_data_member_location (0x00)
0x000000c3: NULL
0x000000c4: DW_TAG_variant
DW_AT_discr_value (0x01)
0x000000c6: DW_TAG_member
DW_AT_name ("T2")
DW_AT_type (0x000000d3 "T2")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_data_member_location (0x00)
0x000000d2: NULL
0x000000d3: DW_TAG_variant_part
DW_AT_discr (0x000000e0)
DW_AT_type (0x000001de "INTEGER")
DW_AT_name ("T2")
0x000000e0: DW_TAG_member
DW_AT_name ("T2")
DW_AT_type (0x000001de "INTEGER")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_data_member_location (0x04)
0x000000ec: DW_TAG_variant
DW_AT_discr_list (<0x07> 01 0b 14 00 03 00 00 )
0x000000f5: DW_TAG_member
DW_AT_name ("T20")
DW_AT_type (0x000001de "INTEGER")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x20)
DW_AT_bit_offset (0xfffffffffffffff9)
DW_AT_data_member_location (0x08)
0x0000010b: NULL
0x0000010c: DW_TAG_variant
DW_AT_discr_value (0x01)
0x0000010e: DW_TAG_member
DW_AT_name ("T21")
DW_AT_type (0x000001de "INTEGER")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x20)
DW_AT_bit_offset (0xfffffffffffffff9)
DW_AT_data_member_location (0x08)
0x00000124: NULL
0x00000125: DW_TAG_variant
0x00000126: DW_TAG_member
DW_AT_name ("T2O")
DW_AT_type (0x000001de "INTEGER")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x20)
DW_AT_bit_offset (0xfffffffffffffff9)
DW_AT_data_member_location (0x08)
0x0000013c: NULL
0x0000013d: NULL
0x0000013e: DW_TAG_variant
DW_AT_discr_value (0x00)
0x00000140: DW_TAG_member
DW_AT_name ("F1")
DW_AT_type (0x000001e5 "S_FLOAT")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x20)
DW_AT_bit_offset (0xfffffffffffffff9)
DW_AT_data_member_location (0x00)
0x00000156: NULL
0x00000157: DW_TAG_variant
DW_AT_discr_value (0x00)
0x00000159: DW_TAG_member
DW_AT_type (0x00000162 "")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_data_member_location (0x00)
0x00000161: NULL
0x00000162: DW_TAG_variant_part
DW_AT_discr (0x00000167)
0x00000167: DW_TAG_volatile_type
DW_AT_type (0x000001ec "CHAR")
DW_AT_byte_size (0x01)
DW_AT_bit_size (0x00)
DW_AT_bit_offset (0x08)
DW_AT_data_member_location (0x00)
0x00000170: DW_TAG_variant
DW_AT_discr_list (<0x03> 01 41 5a )
0x00000175: DW_TAG_member
DW_AT_name ("CA")
DW_AT_type (0x000001f3 "subrange INT32")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x01)
DW_AT_bit_size (0x03)
DW_AT_bit_offset (0xfffffffffffffffe)
DW_AT_data_member_location (0x04)
0x0000018b: NULL
0x0000018c: DW_TAG_variant
DW_AT_discr_list (<0x03> 01 41 5a )
0x00000191: DW_TAG_member
DW_AT_name ("CZ")
DW_AT_type (0x000001fb "subrange INT32")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x01)
DW_AT_bit_size (0x02)
DW_AT_bit_offset (0x04)
DW_AT_data_member_location (0x05)
0x000001a0: NULL
0x000001a1: DW_TAG_variant
DW_AT_discr_list (<0x03> 01 30 39 )
0x000001a6: DW_TAG_member
DW_AT_name ("C0")
DW_AT_type (0x000001e5 "S_FLOAT")
DW_AT_decl_file ("variants_test.pas;5")
DW_AT_decl_line (4)
DW_AT_byte_size (0x04)
DW_AT_bit_size (0x20)
DW_AT_bit_offset (0xfffffffffffffff9)
DW_AT_data_member_location (0x04)
0x000001bc: NULL
Hope this will help.
tromey April 23, 2025, 2:59pm 5
Hi. Thanks for your response.
I don’t know what call_code
means. Could you elaborate on that?
Also, are your LLVM patches available somewhere? I am actively working on this and I would rather not duplicate work if that is possible.
tromey April 23, 2025, 3:01pm 6
Hi. Variant parts are also on my to-do list. I was planning to tackle dynamic sizes and offsets first, because they seemed like a prerequisite.
Anyway – are your LLVM patches for this available somewhere?
tromey May 8, 2025, 2:00pm 7
I went ahead & sent a variant part patch here: Two DWARF variant part improvements by tromey · Pull Request #138953 · llvm/llvm-project · GitHub
I’m going to look at dynamic sizes and bit offsets next.
tromey May 8, 2025, 3:43pm 8
That patch is necessary for Ada, but not sufficient, as in Ada a variant can have multiple members.
Also, I discovered that DIDerivedType::getExtraData
is overloaded for both variants and bit fields, meaning that in the current setup one cannot have a variant that is a bit field. This touches on dynamic bit sizes, since it turns out that the “storage offset” (which is different from the offset-in-bits) is stored here.
I’m not totally sure how I will untangle this but I think allowing a variant to hold multiple member types may be relatively straightforward.