New Language Support for LLDB - Improving Extensibility (original) (raw)

I was looking into implementing debugger support for Hylo, so I found this article which seemed very promising: Adding Language Support

However, I also saw these two comments in the open projects page:

Finish the language abstraction and remove all the unnecessary API’s

An important part of making lldb a more useful “debugger toolkit” as opposed to a C/C++/ObjC/Swift debugger is to have a clean abstraction for language support. We did most, but not all, of the physical separation. We need to finish that. And then by force of necessity the API’s really look like the interface to a C++ type system with a few swift bits added on. How you would go about adding a new language is unclear and much more trouble than it is worth at present. But if we made this nice, we could add a lot of value to other language projects.

Make a more accessible plugin architecture for lldb

Right now, you can only use the Python or SB API’s to extend an extant lldb. You can’t implement any of the actual lldb Plugins as plugins. That means anybody that wants to add new Object file/Process/Language etc support has to build and distribute their own lldb. This is tricky because the API’s the plugins use are currently not stable (and recently have been changing quite a lot.) We would have to define a subset of lldb_private that you could use, and some way of telling whether the plugins were compatible with the lldb. But long-term, making this sort of extension possible will make lldb more appealing for research and 3rd party uses.

Could someone give an update on whether these are still accurate, or if adding a new language is now possible without maintaining a fork? If possible, are there any limitations for features such as adding expression evaluation support for a new language that wouldn’t be possible?
Depending on the scale of the remaining related issues, someone from our research group may contribute to the project in the coming months. Our research goal is to explore the possibilities of making debuggers for new languages, and so far we found LLDB the most suitable. If you know of other technologies that we can learn from, please share them also! Some may provide different levels of debuggability support, and not all new languages may aim for a full expression evaluator, so it would be also interesting to see which approaches come with what tradeoffs.

Relevant Findings

Swift has a full llvm fork, which is not a viable approach for most smaller language maintainers.

Rust uses LLDB as one of its available debuggers, and they have implemented it as a wrapper around lldb. They added type formatters, but other functionality such as expression evaluation is very limited (they implemented expression parsing outside lldb).

@DavidSpickett was looking into Fortran support to LLDB.

In that context, the blog post LLDB's TypeSystems: An Unfinished Interface might also be interesting to you

For a while now, I’ve been wanting to create a “Swift Light” language plugin that doesn’t depend on the Swift compiler on llvm.org/main and use that to motivate generalizing some of the particularly Clang-specific abstractions and better define the boundaries between LLDB core and language plugins, so we could one day have maybe even dynamically linked language plugins.

DmT021 April 2, 2025, 1:41pm 4

FWIW, Swift’s fork has markers indicating patches specific to Swift (example)

// BEGIN SWIFT
// END SWIFT

This can be a good clue to where some work may be needed aside from what the documentation states already.

Thanks all, great resources!

I think studying what Swift is doing could be interesting. I extracted all code snippets from the llvm repo that are marked using those comments, and subsequently had ChatGPT 4.5 + deep research sitting on it for 20 minutes to gather relevant info and analyse these changes. If you want to look at the snippet sources, you can click on the code ranges in the document below each file, which direct you to the current (2025-04-02) ‘next’ branch’s source code on GitHub.

Under that, ChatGpt’s subsequent analysis follows. People working on the Swift compiler know these better, but it can provide some overview and sources to look into for newcomers like me.
Feel free to leave comments directly in the google doc if you see something is incorrect / you would add something.

I felt that it got a bit too optimistic by the end about the amount of effort needed for this API refactoring, but I think I also got convinced that it’s also not impossible, as Swift is using small modifications in some places but most code could be written as stanard LLVM plugins.

Here is the overview from the code analysis:

Total Swift-specific code blocks: 94
Total lines within Swift-specific blocks: 1363
Number of files containing Swift-specific blocks: 45
Blocks by category:
- TypeSystem: 4 blocks
- ExpressionParser: 8 blocks
- LanguageRuntime: 2 blocks
- DataFormatters: 5 blocks
- BuildSystem: 18 blocks
- CoreAPI: 10 blocks
- CommandsAndOptions: 15 blocks
- SymbolAndDebugInfo: 25 blocks
- OtherCore: 7 blocks

Swift-Specific LLDB Extensions in the Swift LLVM Fork.pdf (600.5 KB)

mib April 2, 2025, 10:25pm 6

Great analysis! I would love to see lldb support new languages more easily.

wallace April 2, 2025, 10:34pm 7

I’ll share my experience with supporting mojo in lldb. We made it work this way:

we created a shared library plugin that was loaded at runtime by lldb
this shlib linked against a build of lldb that exposed the lldb_private namespace. That’s doable via cake
this shlib implemented a language plugin, a runtime plugin, a type system, a dwarf parser, etc.
this shlib also linked against the mojo compiler, so the dwarf parser used the compiler underneath for generating the resultant decls and types
we shipped our debugger as a bundle of vanilla lldb and the shlib. Lldb was slightly modified to load the shlib upon initialization. But that was it. It also worked on vscode out of the box.

All in all it was a nice approach because the private API related to language support is very stable. We probably had 4 rebase issues in over a year.

We also did some work to clean up the API so that it’s easier to add a language plugin via a shlib.

jingham April 4, 2025, 4:42pm 8

I also think it would be great to make new languages easier to add. But I really don’t like the idea of having people rely on “accidentally stable API’s”. And if our method is just “expose all of lldb_private” then we can’t control which parts we would rather not have to worry about changing. That would put us in an unintentionally adversarial relationship with the people supporting languages we don’t know about.

We don’t have to expose these as SB API’s, most users of the SB API’s won’t be doing work at the level of the Language Plugin, and don’t think we need to make it possible to write Language Plugins in Python. But IMO if we’re going to do this, we should make another set of API’s for each of the plugins that we are willing to maintain.

wallace April 5, 2025, 3:56am 9

I agree with you here.
I remember that last year I created a namespace for the dwarf plugin code that I needed to link against. I was something like lldb_private::plugin::dwarf.
It would be great to create a list of namespaces that are used for language support that we commit to provide intentional stability.
The biggest two areas that I had to interact with were the type system and dwarf parser.

I would like to add a bold plus + to this topic. Our team is currently looking into adding proper evaluate expression support for Kotlin Native. Unfortunately, maintaining and publishing our own LLDB fork is not an option. We are also ready to contribute if you are looking for support!

wallace April 12, 2025, 4:18am 11

I wonder if a good way for someone to take a stab on this is perhaps try an approach similar to what I did for Mojo as I explained above via a plugin shlib (I don’t work at that company anymore so I don’t have the source code), and then mark all the necessary private APIs so that a new namespace is created for this set of APIs. This can be followed by a refactor of core LLDB that would move all those APIs into a same location in the folder structure as some support lib for language plugins.
But really, this should work with shlib plugins.

My implementation of TypeSystemRust almost exclusively touches what you’d expect - CompilerType and the Decl equivalents, lldb_private::Type, lldb_private::ValueObject, the DWARF (and eventually PDB) related structures like DWARFDIE and DWARFAttributes, and then a handful of utility objects like llvm::DenseMap (which could easily be replaced by any other HashMap implementation) and DumpDataExtractor. That’s not an exhaustive list, but those are the most important parts for the ASTParser and TypeSystem. I haven’t completed the ExpressionParser, but from the bit I’ve touched, most of it is string manipulation that doesn’t really touch LLDB at all, and ValueObject shuffling. The Language plugin relies on some stuff from /DataFormatters.

Overall, it could be much worse.

One major required refactor though is PDB handling (which I know isn’t relevant to all languages), since it is 100% bound to TypeSystemClang. I’ve been poking around with it, and it’s pretty reasonable to give it the same-ish API as DWARFASTParser.

It also may be worth exposing these APIs via C (e.g. llvm-c and clang-c) rather than C++. I heard through the grapevine that one goal of language support is to allow other languages to rely on their own compilers for expression parsing, type/decl representation, etc. That would be super convenient, except C++ FFI is kind of a nightmare.