Demangled names in debug info (PDB) for Swift or other non-C++ languages (original) (raw)

January 18, 2024, 11:02pm 1

Hi,

When the Swift compiler generates the PDB (codeview) files, it currently writes the demangled names (the LF_FUNC_ID field) for user-defined functions but the mangled names for the compiler-synthesized functions that don’t have source-level names like closures.

We are considering switching some of the compiler-synthesized entities to demangled names to make the backtraces more sensible to the application developers.

For example,
$s7Logging0A6SystemO4lock33_C9FF2A9F0C61C813477A18DFECB0159ALL_WZ —> one-time initialization function for lock
$s7Logging6LoggerV13MetadataValueO11descriptionSSvgSSAEXEfU_ —> closure #1 (Logger.MetadataValue) -> String in Logger.MetadataValue.description.getter

As we understand it, C++ uses mangled names and the Windows tools rely on UnDecorateSymbolName to demangle them as needed, while Rust currently demangle all the names in the PDB because the tools don’t understand the Rust demangling.

Do folks have insights as to whether it could potentially cause problems?
For example,

It would increase the PDB file size (demangled names tend to be larger)
Some tools such as WPA and WinDBG make assumptions about the names in PDBs such as no whitespaces or special characters, etc.
Some tools may rely on heuristics of how the names or declarations are to decode additional information (by making assumptions what the associated compilers emit).

Thank you!

rnk January 22, 2024, 10:43pm 2

What I recall is that classes and namespaces are represented as parent scopes, so you get something like this:

$ cat t.cpp
int foo(int a , int b) {
        return a + b;
}
namespace Bar {
int baz(int a , int b) {
        return a + b;
}
}
struct Qux {
        void method();
};
void Qux::method() {}

$ clang -c t.cpp  -g --target=x86_64-windows-msvc && llvm-pdbutil dump -types t.o | grep -B4 -A4 LF_M\\?F
         0x0074 (int): `int`
0x1001 | LF_PROCEDURE [size = 16]
         return type = 0x0074 (int), # args = 2, param list = 0x1000
         calling conv = cdecl, options = None
0x1002 | LF_FUNC_ID [size = 16]
         name = foo, type = 0x1001, parent scope = <no type>
0x1003 | LF_STRING_ID [size = 12] ID: <no type>, String: Bar
0x1004 | LF_FUNC_ID [size = 16]
         name = baz, type = 0x1001, parent scope = 0x1003
0x1005 | LF_STRUCTURE [size = 36] `Qux`
         unique name: `.?AUQux@@`
         vtable: <no type>, base list: <no type>, field list: <no type>
         options: forward ref | has unique name, sizeof 0
0x1006 | LF_POINTER [size = 12]
         referent = 0x1005, mode = pointer, opts = const, kind = ptr64
0x1007 | LF_ARGLIST [size = 8]
0x1008 | LF_MFUNCTION [size = 28]
         return type = 0x0003 (void), # args = 0, param list = 0x1007
         class type = 0x1005, this type = 0x1006, this adjust = 0
         calling conv = cdecl, options = None
0x1009 | LF_FIELDLIST [size = 20]
         - LF_ONEMETHOD [name = `method`]
           type = 0x1008, vftable offset = -1, attrs = public
0x100A | LF_STRUCTURE [size = 36] `Qux`
         unique name: `.?AUQux@@`
--
         options: has unique name, sizeof 1
0x100B | LF_STRING_ID [size = 60] ID: <no type>, String: .../t.cpp
0x100C | LF_UDT_SRC_LINE [size = 16]
         udt = 0x100A, file = 4107, line = 9
0x100D | LF_MFUNC_ID [size = 20]
         name = method, type = 0x1008, class type = 0x1005
0x100E | LF_POINTER [size = 12]
         referent = 0x1005, mode = pointer, opts = None, kind = ptr64
0x100F | LF_STRING_ID [size = 56] ID: <no type>, String: .../t.cpp

The LF_FUNC_ID and LF_MFUNC_ID identifiers are the basenames, with no parameter types, and with the scopes represented separately.

I think this might power unqualified name lookup in the debugger, and maybe you should name your Swift-synthesized functions in whatever way would be easiest to name in the debugger.

I hope that helps.