[RFC] Upstreaming LLDB RPC (original) (raw)

Objective

We’d like to upstream LLDB RPC, a framework used by Xcode to use LLDB out of process while (mostly) maintaining API compatibility with the LLDB SB API. The framework consists of two components: the RPC client, which clients like Xcode link against, and the RPC server, which links against libLLDB and runs as a separate process. The RPC server can link against both older and newer versions of LLDB and handles the absence of SB APIs gracefully.

Motivation

Xcode has relied on the LLDB RPC framework for roughly the last decade. Over the last two years, we’ve made steady investments into this framework. Specifically, we’re now able to generate most of the client and server code from the headers. We believe both the shared communication layer and as well as the generators could be useful to the community and serve as inspiration or a foundation for users solving a similar challenge.

How It Works

RPC communicates with the LLDB SB API by way of a client-server model. Client of the RPC framework link against it and interact with an API that looks almost identical to the LLDB SB API, albeit in an rpc namespace. The client spawns and communicates with a separate process that acts as the server. The client takes all arguments and encodes them before sending them to the server. Information about the function, such as its mangled name and arguments are encoded as a stream of bytes that are then sent to the server side using kernel ports and sockets. The server side interface decodes the stream of bytes and use this information to call into the SB API directly. The return value from this is then encoded by the server and sent back to be decoded by the client side.

Every function in the SB API needs an interface on the client and server side in RPC. We have a tool that generates all interfaces based on the header files for the SB API. The tool uses the information from SB API function headers to generate all code for parameter encoding and decoding and interactions with the SB API as mentioned above. Typically, the client/server interfaces only need to encode the original parameters for the function, but sometimes it needs to change the function signature. This happens when an RPC connection must be passed to the function. One example of this is that instance methods implicitly have a connection from the time their class is instantiated, but static methods do not. As such, the tool automatically changes the function signature such that the RPC connection prepends all other original function parameters in the signature.

How It’s Structured

Given that this has been a long-term downstream project, we want to upstream this code in a manner that makes easily to understand and review. As such, we’re breaking it down into these components:

Firstly is the tool that generates the client/server interfaces. This tool is implemented through various emitters that use Clang tooling to generate the code for these interfaces. On its own, the generated code will not be operational but this allows us to verify the logic we have for this code. Our method of testing this code is checking the generated output against FileCheck based on our assumptions.
Next would be the core code that powers RPC. This includes the code for IPC, packets, encoding and decoding, threading, serialization/deserialzation and other related code. This code is generic and shared between the client and server interfaces. When implemented, this is what allows the client and server interfaces to communicate with each other. While we don’t have an explicit test suite for this portion of RPC, we have the functionality to run the LLDB API test suite using the RPC client/server interfaces. This test suite would need this RPC core code in order for the interfaces to work. It also relies on RPC-specific versions of the Python binding functionality that exists more generally for LLDB.

Testing

We test RPC by running the pre-existing LLDB SB API test suite against the generated RPC client and server that we build. To do this, the Python bindings that are normally used for the API test suite are repurposed and regenerated for RPC. Using the SWIG bindings on their own is not enough however, as the functions need an RPC connection, and many functions have that connection prepended as their first argument. In addition to generating the client server interfaces, the tool mentioned above generates a harness that takes the SB API headers and inserts connections where necessary. This harness is used by SWIG before the actual SWIG bindings are used in the API test suite.

At the end of the upstreaming process would I be able to build and test this on Linux? I think so given that nothing here is Mac OS specific.

What is the challenge exactly, what problems did it solve for Xcode?

From previous work on a GUI debugger I remember:

You don’t want the debug process crashing and taking out the whole debugger.
Stalls in the debug process shouldn’t stall the user interface of the debugger.

I would compare this to the DAP protocol but in that case, the main point is to wrap many different debug servers into one protocol. Here you have the lldb API on both sides. If you consider different lldb versions different debug servers though, kinda the same thing.

Does that mean that upstream testing of this feature would just be moving this testing into public view?

You say most. What sorts of things are left, and what might the typical upstream PR author have to do themselves to keep LLDB RPC working?

I guess that a build would find the problem but we also have to explain to folks how to fix any issues, because they won’t be building with this enabled I expect.

What sorts of things are left?

There are 4 SB API classes and a handful of methods that we don’t generate due to them not being needed by Xcode or due to having deprecated functionality.

And what might the typical upstream PR author have to do themselves to keep LLDB RPC working?

There’s 2 primary things to do to keep LLDB RPC working:

Fixing issues in the client/server interfaces of the SB API. Since these interfaces are auto-generated, this means fixing issues in the emitters that generate these files to ensure that they have the correct/expected output.
Fixing issues in the RPC core itself. From our experience, doing this has been much less common than doing the former task stated above.

jingham April 10, 2025, 7:06pm 5

Another nice feature of the lldb-rpc-server (that’s shared by other out-of-process solutions like DAP) is that lldb benefits a lot from caching and reusing all the parsed system libraries. When you run multiple debug sessions to the same target system, much of this standard libraries will be the same run to run. But the memory for this tends to be large, and if it’s allocated in the memory space of a GUI debugger, it will fragment the memory allocations, and slow down the GUI. By making this happen out of process, you avoid that problem and it’s cheap to just let lldb-rpc-server sit idle when you aren’t using it.

There are a lot of reasons why an out-of-process debugger engine is a good idea.

DAP is a bit reductivist because it tries to have a “common debugger feature set” but that makes it harder to take advantage of the full underlying power of lldb. lldb-rpc-server gives a GUI designer that wants to do more than the DAP expresses a stable performant way to do that.

At the end of the upstreaming process would I be able to build and test this on Linux? I think so given that nothing here is Mac OS specific.

Yes, through upstreaming this you would be able to build it on Linux.

From previous work on a GUI debugger I remember:

You don’t want the debug process crashing and taking out the whole debugger.

Stalls in the debug process shouldn’t stall the user interface of the debugger.

These were the main challenges that RPC solved for Xcode when it was originally developed and it’s grown to be pretty extensive and resilient over the years.

Does that mean that upstream testing of this feature would just be moving this testing into public view?

Yes, this is what we’re doing in upstreaming the testing.

Good to hear that it will be easy to reproduce.

Does this mean that someone wanting to use LLDB RPC would not be able to do so by using a release package (like from Releases · llvm/llvm-project · GitHub) ?

Either yes or no here is fine, but be sure to make it clear early in the LLDB RPC documentation which one it is.

It would be good for debugging the debugger if we were able to take an LLDB we talked to via. LLDB RPC and also talk to that same LLDB via normal scripting, without it being two separate builds. Not essential though, because we can do 2 builds.

Does this mean that someone wanting to use LLDB RPC would not be able to do so by using a release package (like from Releases · llvm/llvm-project · GitHub) ?

The RPC bindings would be a part of the upstreamed code, so if RPC becomes included in a release package then it should be able to be used from there.

It would be good for debugging the debugger if we were able to take an LLDB we talked to via. LLDB RPC and also talk to that same LLDB via normal scripting, without it being two separate builds. Not essential though, because we can do 2 builds.

That would be pretty nice, not sure if it’s exactly possible at the moment but if needs be I can see if that can work.

Another fun use of the RPC server is something @augusto2112 prototype to support “scripting” LLDB with Swift. Swift is a safe, compiled languages, which means that it will take your process down when you do something like an out-of-bounds access. That’s a showstopper because you wouldn’t want your script to take down your debug session. But you can’t just run all the Swift code out-of-process, because now all your SB API calls operate on the LLDB in the Swift process, which is not the LLDB you’re scripting. The prototype would run the Swift code out-of-process and have it use the RPC server to talk to the controlling LLDB instance. There’s a lot of caveats and it’s very much a prototype, but I always thought that was a cool idea.

ashgti April 22, 2025, 2:34am 10

What kind of API would the lldb-rpc-server have? Is it roughly the same as the existing SB API? It sounds like it is, but just curious. If it has the same API as the SB API, is this something that can be enabled transparently? (E.g. a setting or something that causes the SB API to run in the server).

In lldb-dap we do have a server mode that allows one to run multiple debug sessions with the same lldb-dap server to have improved cache sharing between debug sessions. This was only added recently, but from my testing so far its working pretty well for me today.

I am curious if this is also something we could use in the lldb-dap, it may simplify things there if this was handled transparently but I’m not sure how much work it would be to use the lldb-rpc-server mode instead of the SB API.

jingham April 22, 2025, 7:16pm 11

The lldb-rpc API is as close to the SB API as it can be. On the client side you need to establish a connection to the lldb-rpc-server, and so there are bound to be some API’s that require a connection parameter to function.

But most of the time, the connection can be inferred. For instance, an SBTarget came from an SBDebugger which already holds the connection, so you don’t need to pass any of the SBTarget API’s a connection.

I’ve added the first PR to start upstreaming this. It adds the emitters for the server-side APIs as well as the code for the emitters’ ClangTool itself: [DRAFT][lldb] Upstream lldb-rpc-gen and LLDB RPC server-side emitters by chelcassanova · Pull Request #136748 · llvm/llvm-project · GitHub