[ty] Update SpecializationBuilder hook to get both lower/upper bounds by dcreager · Pull Request #23848 · astral-sh/ruff (original) (raw)
added internal
An internal refactor or improvement
Multi-file analysis & type inference
labels
dcreager marked this pull request as ready for review
carljm removed their request for review
dcreager added a commit that referenced this pull request
For awhile we've known that our constraint sets can balloon in size when
they involve large unions, and especially intersections of large unions.
And we've had ecosystem runs (typically involving projects that depend
on numpy) that trigger this pathological behavior. #23848 is the latest
example, with a 25× performance regression for the mesonbuild project.
Guillaume Duboc just defended his PhD thesis in January, and in §11.2, he calls out an optimization first introduced by Frisch for handling these kinds of unions more efficiently. The approach is also described in a post on the Elixir blog. (Frisch and Duboc are both using these BDDs to represent types, whereas we're using them to represent constraints on types, but the same ideas apply.)
Frisch describes the basic idea, which is to add an "uncertain" branch to each BDD node, turning them into ternary decision diagrams (TDDs). Frisch also provides TDD construction rules for union (OR), intersection (AND), and difference. Duboc takes this further and derives more efficient rules for intersection and difference.
This PR implements TDDs and Frisch's and Duboc's construction rules.
I've confirmed that this completely eliminates the performance
regression for mesonbuild on #23848.
More details on how this works, and why we get these savings:
The key benefit is that they let us represent unions more efficiently.
As a simple example, with our quasi-reduced BDDs from before, a ∨ b
becomes:
<n1> a
┡━₁ <n2> b
│ ┡━₁ true
│ └─₀ true
└─₀ <n3> b
┡━₁ true
└─₀ falseWith TDDs, the rhs of a ∨ b is moved into the new "uncertain" branch:
<t1> a
┡━₁ true
├─? <t2> b
│ ┡━₁ true
│ ├─? false
│ └─₀ false
└─₀ falseWe already have some savings, since the TDD representation "allows
unions to be kept lazy, postponing expansion until needed for
intersection or difference operations". We "park" the rhs as-is into the
uncertain branch, so we only need one (existing) copy of it. In the BDD
case, we had to fold the rhs into the a = true case, creating an
entire (modified) copy of its subgraph. That means we only need 2 nodes
for the TDD instead of 3 for the BDD. (With only two variables, this
might not seem like a lot, but we've actually gone from O(n²) nodes to
O(n).)
We get even more savings when with more complex formulas, like (a ∨ b) ∧ (c ∨ d). With BDDs, we get:
<n1> a <n7> a
┡━₁ <n2> b ┡━₁ <n8> b
│ ┡━₁ true │ ┡━₁ <n4> c
│ └─₀ true │ │ ┡━₁ <n5> d
└─₀ <n3> b │ │ │ ┡━₁ true
┡━₁ true │ │ │ └─₀ true
└─₀ false │ │ └─₀ <n6> d
│ │ ┡━₁ true
<n4> c │ │ └─₀ false
┡━₁ <n5> d │ └─₀ <n4> SHARED
│ ┡━₁ true └─₀ <n9> b
│ └─₀ true ┡━₁ <n4> SHARED
└─₀ <n6> d └─₀ false
┡━₁ true
└─₀ falseWith TDDs, we get:
<t1> a <t5> a
┡━₁ true ┡━₁ <t3> c
├─? <t2> b │ ┡━₁ true
│ ┡━₁ true │ ├─? <t4> d
│ ├─? false │ │ ┡━₁ true
│ └─₀ false │ │ ├─? false
└─₀ false │ │ └─₀ false
│ └─₀ false
<t3> c ├─? <t6> b
┡━₁ true │ ┡━₁ <t3> SHARED
├─? <t4> d │ ├─? false
│ ┡━₁ true │ └─₀ false
│ ├─? false └─₀ false
│ └─₀ false
└─₀ falseThat's 7 nodes for the BDDs, and 4 for TDDs — still linear in the total
number of variables, even though our BDDs are only quasi-reduced. And
also note that we never had to modify the TDD that represented the rhs
of the AND (t3 = c ∨ d).
dcreager marked this pull request as ready for review
ibraheemdev added a commit that referenced this pull request
…narrowing (#24025)
#23848 (comment) got me thinking about how to short-circuit earlier during generic call inference.
dcreager deleted the dcreager/solutions-hook branch
carljm added a commit that referenced this pull request
- main:
[ty] Avoid eager TypedDict diagnostics in
TypedDict | dictunions (#24151)F507: Fix false negative for non-tuple RHS in%-formatting (#24142) [ty] UpdateSpecializationBuilderhook to get both lower/upper bounds (#23848) Fix%foo?parsing in IPython assignment expressions (#24152)E501/W505/formatter: Exclude nested pragma comments from line width calculation (#24071) [ty] Fix Salsa panic propagation (#24141) [ty] Supporttype:ignore[ty:code]suppressions (#24096) [ty] Support narrowing for extended walrus targets (#24129)
dcreager added a commit that referenced this pull request
…rs (#24383)
The ConstraintSet::solutions method returns a (set of) solutions for a
constraint set — assignments of specific types to each typevar in
question. #23848 introduced two variants of this method. One of them
(solutions_with_inferable) would take in the set of inferable
typevars. This was used in a cycle check at the beginning of the method,
to make sure that we only considered the typevars we're actually solving
for when detecting a cyclic constraint set. More importantly, it was
also used to limit the result, so that we would only get solutions for
ther inferable typevars (i.e., the ones that we're using the constraint
set to solve for).
A cleaner approach is to use extensional quantification to remove
the non-inferable typevars from the constraint set before calculating
solutions. We already had this available as a lower-level TDD method
(abstract_one_inner), which lets us provide an arbitrary
should_remove callback to determine which constraints to keep and
which to remove. We just need to add a new public API method that
provides a should_remove callback that keeps only the constraints
involving inferable typevars.
Given this change we can actually remove the cyclic checks completely,
since SequentMap and PathAssignments will already bottom out if they
encounter a cycle in the constraints. (Specifically, while we're walking
TDD paths, PathAssignments will only add constraints that aren't
already present in the current path.)
KotlinIsland pushed a commit to KotlinIsland/basedpython that referenced this pull request
ibraheemdev added a commit that referenced this pull request
…ls (#24506)
This regressed in #23848 because we didn't have any test coverage for the collection literal case.
thejchap pushed a commit to thejchap/ruff that referenced this pull request
anishgirianish pushed a commit to anishgirianish/ruff that referenced this pull request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})