proposal: Add a Partial Deadlock Detector For Go · Issue #13759 · golang/go (original) (raw)
The Proposal
Deadlock's make programs fail. Partial deadlocks (where a subset of the goroutines in a system are deadlocked) are difficult to find and debug. While predicting the occurrences of these deadlocks is not computable, detecting them when they occur is. Adding in a partial deadlock detector would be a powerful tool in debugging our concurrent programs.
Like the race detector, a flag-able deadlock detector would use environment variables to log, or exit on detected deadlocks. The deadlock detector would not detect all deadlocks, or potential deadlocks, only a subset of those that are occurring.
The goal of this proposal is to see if such a feature is desirable, and present a plausible implementation. If it is desirable a detailed design doc will follow.
Is this the Halting Problem?
No. If the necessary conditions are met the program will not continue.
Proposed Implementation
High Level
In short the proposed implementation is to "hook" into the garbage collector, and use it's mark phase to build a wait-for graph. Then use Tarjan's algorithm to find deadlock cycles. That is the strongly connect components with no edges to non deadlocked components.
More Detailed
As go uses a mark-sweep garbage collector most of the computationally expensive work necessary for dead lock detection is already being done. Here is how the proposed detector would work (in very high level terms):
- At the beginning of the mark phase all goroutines in a wait state are examined, along with the object they are waiting on.
- As objects are marked (colored black or gray) from the various roots, if the object is one one of the "waited on" resources, it can be added as an edge in our wait for graph.
- Any resource which is not referenced from another root will be added as an edge to a "nothing" node.
- At the end of the mark phase (e.g. post stop the world) run Tarjan's (or similar) to determine the strongly connected components.
- If we have a strongly connected component with no edges to a non deadlocked component, the involved goroutines are deadlocked.
Notes
I have intentionally glossed over:
- Dealing with timers (interruptible waits) and globally referenced resources
- That an object referenced by multiple roots may contain a reference to a waited on object in another root (paths to the waited on object would have to be stored and examined when previously colored nodes are addressed)
- Send and receive mechanics on channels
for simplicity's sake.