[ty] Update SpecializationBuilder hook to get both lower/upper bounds by dcreager · Pull Request #23848 · astral-sh/ruff (original) (raw)

@dcreager added internal

An internal refactor or improvement

ty

Multi-file analysis & type inference

labels

Mar 10, 2026

@dcreager dcreager marked this pull request as ready for review

March 10, 2026 00:51

@carljm carljm removed their request for review

March 10, 2026 01:00

dcreager added a commit that referenced this pull request

Mar 17, 2026

@dcreager

…#23881)

For awhile we've known that our constraint sets can balloon in size when they involve large unions, and especially intersections of large unions. And we've had ecosystem runs (typically involving projects that depend on numpy) that trigger this pathological behavior. #23848 is the latest example, with a 25× performance regression for the mesonbuild project.

Guillaume Duboc just defended his PhD thesis in January, and in §11.2, he calls out an optimization first introduced by Frisch for handling these kinds of unions more efficiently. The approach is also described in a post on the Elixir blog. (Frisch and Duboc are both using these BDDs to represent types, whereas we're using them to represent constraints on types, but the same ideas apply.)

Frisch describes the basic idea, which is to add an "uncertain" branch to each BDD node, turning them into ternary decision diagrams (TDDs). Frisch also provides TDD construction rules for union (OR), intersection (AND), and difference. Duboc takes this further and derives more efficient rules for intersection and difference.

This PR implements TDDs and Frisch's and Duboc's construction rules. I've confirmed that this completely eliminates the performance regression for mesonbuild on #23848.


More details on how this works, and why we get these savings:

The key benefit is that they let us represent unions more efficiently. As a simple example, with our quasi-reduced BDDs from before, a ∨ b becomes:

<n1> a
┡━₁ <n2> b
│   ┡━₁ true
│   └─₀ true
└─₀ <n3> b
    ┡━₁ true
    └─₀ false

With TDDs, the rhs of a ∨ b is moved into the new "uncertain" branch:

<t1> a
┡━₁ true
├─? <t2> b
│   ┡━₁ true
│   ├─? false
│   └─₀ false
└─₀ false

We already have some savings, since the TDD representation "allows unions to be kept lazy, postponing expansion until needed for intersection or difference operations". We "park" the rhs as-is into the uncertain branch, so we only need one (existing) copy of it. In the BDD case, we had to fold the rhs into the a = true case, creating an entire (modified) copy of its subgraph. That means we only need 2 nodes for the TDD instead of 3 for the BDD. (With only two variables, this might not seem like a lot, but we've actually gone from O(n²) nodes to O(n).)

We get even more savings when with more complex formulas, like (a ∨ b) ∧ (c ∨ d). With BDDs, we get:

<n1> a                      <n7> a
┡━₁ <n2> b                  ┡━₁ <n8> b
│   ┡━₁ true                │   ┡━₁ <n4> c
│   └─₀ true                │   │   ┡━₁ <n5> d
└─₀ <n3> b                  │   │   │   ┡━₁ true
    ┡━₁ true                │   │   │   └─₀ true
    └─₀ false               │   │   └─₀ <n6> d
                            │   │       ┡━₁ true
<n4> c                      │   │       └─₀ false
┡━₁ <n5> d                  │   └─₀ <n4> SHARED
│   ┡━₁ true                └─₀ <n9> b
│   └─₀ true                    ┡━₁ <n4> SHARED
└─₀ <n6> d                      └─₀ false
    ┡━₁ true
    └─₀ false

With TDDs, we get:

<t1> a                      <t5> a
┡━₁ true                    ┡━₁ <t3> c
├─? <t2> b                  │   ┡━₁ true
│   ┡━₁ true                │   ├─? <t4> d
│   ├─? false               │   │   ┡━₁ true
│   └─₀ false               │   │   ├─? false
└─₀ false                   │   │   └─₀ false
                            │   └─₀ false
<t3> c                      ├─? <t6> b
┡━₁ true                    │   ┡━₁ <t3> SHARED
├─? <t4> d                  │   ├─? false
│   ┡━₁ true                │   └─₀ false
│   ├─? false               └─₀ false
│   └─₀ false
└─₀ false

That's 7 nodes for the BDDs, and 4 for TDDs — still linear in the total number of variables, even though our BDDs are only quasi-reduced. And also note that we never had to modify the TDD that represented the rhs of the AND (t3 = c ∨ d).

@dcreager

@dcreager

@dcreager dcreager marked this pull request as ready for review

March 17, 2026 18:39

ibraheemdev added a commit that referenced this pull request

Mar 17, 2026

@ibraheemdev

…narrowing (#24025)

#23848 (comment) got me thinking about how to short-circuit earlier during generic call inference.

@dcreager

ibraheemdev

@dcreager

ibraheemdev

@dcreager

@dcreager dcreager deleted the dcreager/solutions-hook branch

March 24, 2026 13:54

carljm added a commit that referenced this pull request

Mar 25, 2026

@carljm

dcreager added a commit that referenced this pull request

Apr 21, 2026

@dcreager

…rs (#24383)

The ConstraintSet::solutions method returns a (set of) solutions for a constraint set — assignments of specific types to each typevar in question. #23848 introduced two variants of this method. One of them (solutions_with_inferable) would take in the set of inferable typevars. This was used in a cycle check at the beginning of the method, to make sure that we only considered the typevars we're actually solving for when detecting a cyclic constraint set. More importantly, it was also used to limit the result, so that we would only get solutions for ther inferable typevars (i.e., the ones that we're using the constraint set to solve for).

A cleaner approach is to use extensional quantification to remove the non-inferable typevars from the constraint set before calculating solutions. We already had this available as a lower-level TDD method (abstract_one_inner), which lets us provide an arbitrary should_remove callback to determine which constraints to keep and which to remove. We just need to add a new public API method that provides a should_remove callback that keeps only the constraints involving inferable typevars.

Given this change we can actually remove the cyclic checks completely, since SequentMap and PathAssignments will already bottom out if they encounter a cycle in the constraints. (Specifically, while we're walking TDD paths, PathAssignments will only add constraints that aren't already present in the current path.)

KotlinIsland pushed a commit to KotlinIsland/basedpython that referenced this pull request

May 1, 2026

@ibraheemdev

ibraheemdev added a commit that referenced this pull request

May 15, 2026

@ibraheemdev

…ls (#24506)

This regressed in #23848 because we didn't have any test coverage for the collection literal case.

thejchap pushed a commit to thejchap/ruff that referenced this pull request

May 23, 2026

@ibraheemdev @thejchap

anishgirianish pushed a commit to anishgirianish/ruff that referenced this pull request

May 28, 2026

@ibraheemdev @anishgirianish

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})