[RFC] lldb-dap refactoring to support async operations and cancellation (original) (raw)

Abstract

Using lldb-dap today on a large binary (for example, I have been debugging an iOS application built for debugging that is ~650MB + 1.3GB of dSYMs).

If you hit a breakpoint and try to perform some operations like stepping or inspecting a variable is not uncommon for lldb-dap to block and appear to hang for a bit.

This is primarily due to the sequential nature of how lldb-dap handles requests.

For example, when you hit a breakpoint the DAP flow is:

event 'stopped'
request 'threads'
request 'stackTrace' { levels: 1, threadId: '<stopped-thread>' }
request 'breakpointLocations' { '<stackTrace[0]>' }
request 'stackTrace' { levels: 19, startFrame: 1, threadId: '<stopped-thread>' }
request 'scopes' { frameId: '<stopped-thread-frame>' }
request 'variables' { variablesReference: 1 }

This chain of events alone on the app I am debugging took ~3s, during which you cannot step or continue the process. If you step, then there is another delay of approx. ~3s as the same data is fetched for the new stopped event.

Additionally if you accidentally hovered over a very large or complex variable (say a UIViewController with many properties) then tried to step your step request would be blocked by the hover request.

To get a sense of how this is affecting lldb-dap, I did some profiling while stopping and stepping through the large app that includes swift, obj-c and c++ code and here is a breakdown of the lldb-dap time spent:

request_variables (35% of my trace)
request_stackTrace (27%)
request_threads (16%)
request_breakpointLocations (8%)
(other…)

Part of the Debug Adapter Protocol to help address this is support for the cancel request. For example, if you hover over a variable then move the cursor off the variable before the response has been sent a cancel request will be made.

Debug Adapter Protocol Cancel Spec

Proposal

I think to address the responsiveness and improve the overall debugging experience we can refactor lldb-dap to support cancelling requests.

To support the ability to cancel a request we will need to refactor our IO handling and protocol support to break requests into smaller operations that we can try to process as they arrive and support a mechanism for queuing operations.

There is a very similar problem and implementation of a similar protocol in the llvm project in clangd for handling the Language Server Protocol. In clangd, there is a task dispatcher that will queue operations on a per document basis. We don’t have a precise equivalent in the debugger world.

I started a prototype here [lldb-dap] Creating a new mechanism for registering async request han… · ashgti/llvm-project@5248b79 · GitHub that splits the lldb_dap::DAP::Loop into a reader thread and a queue. The queue is currently not yet supporting cancellation but this would be a first step to creating an async flow.

Additionally, for handling requests, I updated the request handler registration to take a callback, for example:

void request_foo(
    DAP &dap,
    const FooArguments &Args,
    Callback<FooResponseBody> Reply) {
  // ... do some blocking operation, then schedule an async reply ...
  dap.PerformAsyncWork([&](auto result){
    // ... Reply off the request evaluation thread ...
    Reply(result);
  });
}

With this change, we’re more easily able to break up calls into async operations that then call Reply(resultOrError).

The basic flow with this change would be to read the requests as they come in, appending them to a queue and then having a task management mechanism to dequeue requests and dispatch them. I think for the first iteration of this change we can only have a single worker evaluating requests. SBDebugger does have a mechanism for requesting an interrupt SBDebugger::RequestInterrupt() to make an attempt at interrupting a running operation. We may be able to use the RequestInterrupt call to stop some inflight operations, but I think that would be something to consider on a case by case basis (e.g. we may not want to cancel an inflight evaluate request, but an inflight variables request may be okay to cancel).

One major challenge I’ve come across is while prototyping is needing some way to support either cancelling a read or to perform a select style call on the input stream. The existing SelectHelper only supports selecting on Sockets at the moment, not pipes or file handles, this may be something to investigate improving but I’m not as familiar with Windows APIs.