[RFC] Intra-procedural Lifetime Analysis in Clang (original) (raw)

Utkarsh Saxena @usx95
Dmytro Hrybenko @gribozavr
Yitzhak Mandelbaum @ymand
Jan Voung @jvoung
Kinuko Yasuda @kinu

Summary

Clang’s current lifetime analysis operates locally within a single statement and cannot track object lifetimes across basic blocks or control-flow constructs.

This RFC proposes a new intra-procedural, flow-sensitive lifetime analysis for Clang to detect a broader class of use-after-scope issues, such as use-after-free and use-after-return, particularly those involving stack-allocated variables. The specific details of the underlying dataflow algorithm are omitted here, as this RFC focuses on the goals, user visible changes, and high-level approach, rather than serving as a detailed design document.

At its core, this analysis performs a form of points-to analysis based on OriginSets and Loans. An OriginSet is a symbolic identifier associated with a pointer-like object (pointer, reference, view), representing a set of possible Loans it could hold. A Loan represents an act of borrowing from a specific memory location. The underlying dataflow analysis and lifetime model are inspired by Rust’s Polonius borrow checker, adapted significantly for C++ semantics.

This approach tracks the set of Loans within a pointer’s OriginSet across control flow. The analysis respects existing annotations (such as clang::lifetimebound, gsl::Pointer, gsl::Owner). We would use approximations and gradual typing because C++ functions often lack necessary lifetime annotations (like clang::lifetimebound), or sometimes their lifetime contracts are too complex to be fully expressed using the existing annotation system. Consequently, it assigns an ‘Opaque’ (or Unknown) Loan to an OriginSet when a pointer’s source is unclear, particularly after calls to such functions.

The analysis offers different strictness levels (-Wdangling-safety and -Wdangling-safety-permissive). This configuration allows users to control the sensitivity of the warnings issued, managing the trade-off between finding more potential bugs and reducing false positive reports (as detailed in the Permissive vs. Strict Modes section).

This analysis is intended to eventually supersede Clang’s existing statement-local lifetime checker with strictly more powerful capabilities.

C++ Lifetime Model: An alias-based approach

Inspired by Polonius, this analysis uses a points-to technique based on OriginSets and Loans designed for intuitive understanding. Here’s how it works on a high level:

This focus on tracking the possible sources (Loans) contained within a pointer’s OriginSet and checking their validity upon use aims to make warnings easier to understand and debug than more abstract models (e.g., NLL (non-lexical lifetime) in Rust).

The analysis tracks the set of loans associated with each pointer’s OriginSet {…} for ptr through the control flow. Consider these examples.

void simple() {
    std::string_view ptr; // ptr's origin set is {} (empty)
    {
        std::string small = "short lived";
        ptr = small; // Taking a reference to 'small' creates a loan 'L' with path 'small'.
                     // ptr's origin set contains Loan L.
    }  // lifetime of 'small' ends => Loan L expires.
    // ptr's origin set is {<expired L>}
    std::cout << ptr; // UaF: origin set contains expired loan L.
}

Origin sets merge at join points in the CFG.

void branch(bool condition) {
    std::string large = "long lived";
    std::string_view ptr = large; // Loan L_large is created; ptr's origin set is {L_large}

    if (condition) {
        std::string small = "short lived";
        ptr = small; // Loan L_small is created. ptr's origin set is {L_small}
    }  // L_small expires
    // Origin sets merge: {L_large, <expired L_small>}
    std::cout << ptr; // UaF: origin set potentially contains expired loan L_small.
}

Reassignments overwrite the origin set.

void reassignments(bool condition) {
    std::string large = "long lived";
    std::string_view ptr = large; // Loan L_large is created; ptr's origin set is {L_large}

    if (condition) {
        std::string small = "short lived";
        ptr = small; // Loan L_small is created; ptr's origin set is {L_small}
    }  // L_small expires.
    // Origin sets merge: {L_large, <expired L_small>}

    ptr = large; // New loan L_large2 is created with path 'large' at this borrow site.
                 // Reassignment: ptr's origin is now just {L_large2}.
                 // The potential link to '<expired L_small>' is removed.
    std::cout << ptr; // Ok.
}

Pointer assignment propagates the origins.

void pointer_assignments() {
    std::string_view ptr1; // ptr1's origin is {}
    {
        std::string small = "short lived";
        std::string_view ptr2; // ptr2's origin is {}

        ptr2 = small; // L_small; ptr2's origin set is {L_small}
        ptr1 = ptr2;  // Assignment: ptr2 flows into ptr1.
                      // ptr1's and ptr2's origin is {L_small}
    }
    std::cout << ptr1; // UaF; origin contains expired loan L_small.
}

Output origin covers input origins resulting in a union.
When a function has [[clang::lifetimebound]] parameters, its return value’s Origin is constrained by the Origins of those parameters. For functions like below, this means the return Origin effectively contains the union of Loans from all lifetimebound input Origins.

std::string_view max(std::string_view a [[clang::lifetimebound]],
                     std::string_view b [[clang::lifetimebound]]);

void form_subsets() {
    std::string a = "a";
    std::string b = "b";

    std::string_view ptr1 = a; // Loan La; ptr1's origin is {La}
    std::string_view ptr2 = b; // Loan Lb; ptr2's origin is {Lb}

    std::string_view ptr3 = max(ptr1, ptr2);
                               // ptr1's origin is a subset of ptr3.                                          
                               // ptr2's origin is a subset of ptr3.
                               // => ptr3's origin is {La, Lb}
}

Opportunistic Bug finding

Inner types (Structs, Containers): While the core model focuses on Origins associated with top-level variables and expressions (pointers, references, views), we also aim to provide opportunistic bug finding for common patterns involving pointers within aggregate types (struct members, std::pair) or containers (e.g., std::vector).

This approach relies on heuristics and specific knowledge of common types, similar to the existing statement-local analysis (e.g., container of pointers). It is less general than a system with full support for Rust-like lifetime parameters on type definitions but allows catching important classes of bugs today. As Clang potentially adopts more explicit lifetime annotations for types, the reliance on these special handling would diminish.

struct S {
    std::string_view a; // Member 'a' has Origin Oa
    std::string_view b; // Member 'b' has Origin Ob
};

S return_struct_with_local() {
    std::string local_str = "local";

    S s; // Instance 's' created. Origins Oa={}, Ob={} initialized.

    s.a = global_str; // Oa = {L_global}
    s.b = local_str;  // Ob = {L_local}
    return s; // Returning 's'.
              // L_local expires.
              // Oa contains {L_global} => Ok
              // Ob contains {L_local} => UaR.
}
std::vector<std::string_view> return_vector_with_local() {
    std::vector<std::string_view /*Inner origin Oi*/> v; // Oi = {}

    std::string local = "local";

    // vector::push_back(T) is [[clang::lifetime_capture_by(this)]];
    v.push_back(local);  // Loan L_local to 'local'.
                         // Oi = {L_local}.
    v.push_back(global); // Loan L_global to 'global'.
                         // Oi = {L_local, L_global}.

    return v; // Returning 'v' associated with Oi containing loan L_local.
} // End scope: L_local expires.

Permissive (-Wdangling-safety-permissive) vs. Strict (-Wdangling-safety) Modes

This lifetime analysis reports potential issues under two different warning flags, -Wdangling-safety-permissive (permissive mode) and -Wdangling-safety (strict mode), corresponding to the analysis’s confidence that a true bug exists.

The core analysis tracks the Origin set (representing the set of Loans it might hold). The difference between the permissive and strict modes then lies in their reporting criteria: the permissive mode typically reports only if the pointer must be dangling (a “must-analysis”), whereas the strict mode reports if the pointer may be dangling (a “may-analysis”).

Warning Trigger Conditions:
We issue a warning under the following conditions:

  1. Loan Expiry: A Loan L, representing a borrow, created at point P_borrow, expires at point P_expire (e.g., stack variable goes out of scope).
  2. Potential Dangling Pointer: At P_expire, any pointer Ptr whose Origin O_ptr contains the (now expired) Loan L is considered potentially dangling.
  3. Liveness and Use: A diagnostic is generated only if such a pointer Ptr is potentially used at a later point P_use (meaning Ptr is “live” at P_expire).

Determining the warning group:

std::string global_str = "STATIC";

std::string_view permissive() {
  std::string local = "local";
  view = local;  // P_borrow: error: 'local' doesn't live long enough [-Wdangling-safety-permissive]
                 // P_expire: 'local' expires.
  return view;   // P_use: note: returned here.
}

std::string_view strict(bool condition) {
  std::string local = "local";
  std::string_view view = global_str;
  if (condition) {
    view = local;  // P_borrow: error: 'local' doesn't live long enough [-Wdangling-safety]
  } 
  // P_expire: 'local' expires after return.
  return view; // P_use: note: returned here.
}

This allows users to choose between broader detection (strict) and higher confidence with less noise (permissive).

Note: The warning group -Wdangling-safety implies and subsumes -Wdangling-safety-permissive.

Gradual typing: Opaque / Unknown Semantics

It’s common to call functions where the lifetime relationship between inputs and outputs isn’t explicitly declared (e.g., missing [[clang::lifetimebound]]) or is too complex to be expressed using existing clang annotations.
When the analysis encounters a pointer or reference initialized from such an “opaque” source:

This is a conservative approximation to avoid false positives. Since the analysis doesn’t know when the memory backing the opaque pointer actually becomes invalid, it optimistically assumes it remains valid for the entire duration of the current function regardless of the strictness modes mentioned above.

std::string_view opaque_view();

void foo() {
    std::string_view x; // Ox = {}
    x = opaque_view();  // Ox = {Opaque}
}

void store(std::string_view* output);

void foo() {
    std::string_view x; // Ox = {}
    store(&x);          // Ox = {Opaque}
}

Future enhancements

Relation to Rust-like Lifetimes

This analysis draws inspiration from Rust’s Polonius borrow checker. While adapted for C++'s semantics (e.g., handling opaque calls, no enforced exclusivity, configurable strictness), its internal model still uses concepts like Loans and Origins which are analogous to formulation of lifetime in Rust’s Polonius.

This notion of a Origin/Loans serves a role closely related to the lifetime/origin tracking in Polonius, providing a conceptual bridge. This proposal develops the necessary CFG-based dataflow infrastructure that could also support more explicit, Rust-style lifetime systems if they were introduced to Clang.

If Clang evolves to include Rust-like lifetime annotations (e.g., annotating pointers and references with fine-grained lifetimes like T& [[clang::lifetime(a)]]), this analysis framework is positioned to directly consume them. User-provided annotations could then directly inform the calculation of which Loans belong to which Origins. For instance, explicit “outlives” constraints (like 'a: 'b in Rust) would translate directly to subset relations between the corresponding Origins, which this analysis could then enforce. This would replace current approximations (like ‘Opaque Loans’ for unknown function calls or heuristics for unannotated parameters) and significantly increase the precision of the analysis. Furthermore, explicit lifetime parameters on types could reduce the need for special handling of nested pointers/views (e.g., within containers or structs, as discussed previously in "_Opportunistic Bug Finding_”) by making their lifetime dependencies clear.

However, the analysis described here delivers value independently for today’s C++ and does not rely on the adoption of Rust-style lifetimes.

RFC

Apart from the overall direction of introducing a more powerful, function-local lifetime analysis in Clang, we also seek feedback on the following points:

Warning Flags and Naming

We propose to add this analysis to Clang under two warning flags:

Other naming schemes considered:

Experimental prefix:

Default Enablement

Code structure

Performance

Performance impact is a key consideration and will be monitored closely during development. While the underlying dataflow analysis approach is expected to be manageable for typical C++ functions, for particularly complex cases (e.g.), we have the option to cap the analysis after a certain number of iterations per function. To maintain reasonable compile times, this bug-finding tool might miss some issues in extremely complex functions, which is an acceptable compromise.

Appendix: More examples

std::string_view 
Lifetimebound(std::string_view str [[clang::lifetimebound]]);

std::string_view foo(bool cond) {
  std::string local;
  std::string_view view;
  view = Lifetimebound(local);
  return result; // error: returning reference to stack variable 'local'.
}
int* result;
if (std::unique_ptr<int> ptr = create(); ptr != nullptr) {
  result = ptr.get(); // error: 'result' points to 'ptr' which doesn't live long enough.
}
use(result); // note: later used here.

Lifetime_capture_by(X)

struct S {
  void set(std::string_view x [[clang::lifetime_capture_by(this)]]) { view = x; }
  std::string_view view;
}

void foo() {
  S s;
  if (condition) {
    std::string local;
    s.set(local); // error: 's' captures 'local' which doesn't live long enough.
  }
  use(s); // note: later used here.
}

Container of views: Vector

void foo() {
  std::vector<std::string_view> views;
  if (condition) {
    std::string local;
    views.push_back(local); // error: 'views' captures 'local' which doesn't live long enough.
  }
  use(view); // note: later used here.
}

Container of views: Maps

absl::flat_hash_map<std::string_view, int> views;
for (...) {
  std::string local;
  // UaF only if it's inserting the key but we may choose to always error.
  auto& v = views[local]; // error: captures 'local' which doesn't live long enough.
}
use(map_of_views); // note: later used here.

Member Pointers

struct S {
  std::string_view a;
  std::string_view b;
};

void foo() {
  std::string safe;
  S s;
  s.a = safe;
  if (condition) {
    std::string unsafe;
    s.b = unsafe; // error: 's.b' points to 'unsafe' which doesn't live long enough.
  }
  use(s); // note: later used here.
}
// Store a pointer to small scope local object in a member pointer.
class S {
  void foo() {
    std::string local;
    view_ = local; // error: 'view_' points to 'local' which doesn't live long enough.
  }
private: 
  std::string_view view_;
};

Lambdas and callbacks

Async functions accepting callbacks can be annotated with capture_by(this).

thread::Scheduler scheduler;
if (condition) {
    std::string local = "blah";
    scheduler.Add([&]() -> { return use(local); }); // error: 'scheduler' catpures 's' which doesn't live long enough.
}
scheduler.Join(); // note: later used here.

Suggest lifetime annotations

These annotations remain the only way to convey lifetimes across function boundaries and high-quality suggestions have previously helped us uncover several bugs.

std::string_view TrimPrefix(std::string_view in [[clang::lifetimebound]]);
std::string_view TrimSuffix(std::string_view in [[clang::lifetimebound]]);

std::string_view Trim(std::string_view in) { // error: missing lifetimebound on 'in'.
  return TrimPrefix(TrimSuffix(in));
}

Pointer/Iterator invalidation

std::vector<int> v = {1, 2, 3, 4};
auto it = v.find(1);
v.push_back(5); // error: 'it' is not valid anymore.
use(*it); // use-after-free