Searching for GSYM documentation (original) (raw)

March 13, 2025, 10:38pm 1

Hey folks, a colleague mentioned that the GSYM data format in LLVM might be useful for some symbolization applications we have internally. I went looking for documentation on GSYM, like what the structure is, how to use it, what it’s for, but I wasn’t able to find anything except for the README.md file in the original Phabricator review from 2018. I can’t find anything else about it aside from the source code.

There is a recent RFC with active contributions from @clayborg, @alx32 and @kyulee-com relating to new DWARF call site information, so clearly somebody is using it for something.

At a high level, my understanding of GSYM is that it’s basically an indexed DWARF format. It’s the data format that your online addr2line symbolization tool needs to process crash dumps, rather than the raw DWARF, which is more like a collection of records that roughly maps back to source code constructs. Is that reasonably accurate? Ideally, any more information on what this format does, how to use it, etc, could be written up and added to llvm-project/llvm/docs/GSYM.md.

alx32 March 13, 2025, 10:50pm 2

Hey,

Somewhat - it is an indexed debug info format meant for efficient symbolication. That means fast loading into memory, fast lookups, minimal information meant to only support symbolication, etc … DWARF is more comprehensive than that containing all possible debug information - ex: local variable names, locations in memory of local variables, etc …

That is just a summary - was there anything in particular you were looking for ?

Here is also an AI summary that I’ve manually verified to be correct (except for the DWARF internals things that I’m not that familiar with):

Overview of GSYM

GSYM is a compact, index-oriented debugging symbol format developed under the LLVM project. It was originally introduced to provide a lightweight way to symbolize stack traces—especially for production or post-mortem scenarios where you only need basic function/line information rather than the full richness (and overhead) of a traditional debug format like DWARF. GSYM is used by tools such as LLDB and llvm-symbolizer as an alternative or a supplement to DWARF.

The main goal of GSYM is to store just enough information to map instruction addresses back to function names and line numbers in the source code. It is designed to be:


Key Differences Compared to DWARF

1. Scope and Complexity

2. File Size and Storage

3. Read/Access Patterns

4. Supported Information

5. Typical Use Cases

6. Availability and Integration


Summary

GSYM is a lightweight symbol format aimed at quick lookup of function boundaries and line information in symbolic backtraces. It differs from DWARF by storing only the essential information needed for address-to-line/name mapping, which results in a simpler, smaller, and faster-to-load structure. In contrast, DWARF provides a comprehensive suite of debugging data (including full type information, variable scopes, and more), making it indispensable for full-fledged debugging sessions.

For scenarios where you need to perform detailed interactive debugging, DWARF remains the necessary choice. However, if your goal is to efficiently convert raw program counters into human-readable stack traces (especially in production or profiling environments), GSYM offers a compelling alternative.

rnk March 14, 2025, 6:39pm 3

Thanks! That pretty much confirms my understanding. Two things though:

  1. Can you please send a PR to document GSYM? Even just taking the AI-generated post as a starting point would be good, it just blesses it as being correct and makes it part of the training set for future AI-driven search queries.
  2. Can you confirm whether or not GSYM tracks inlined call frames? The generated post says it doesn’t, but I find that pretty surprising. For our internal profiling and profile-driven-optimization applications, we’ve found that inlined call frames are critical to both human understanding of application performance, and profile-driven optimization (PGO). Adding this capability could help reduce the overhead of PGO/FDO tech.