[ty] defer inference of legacy TypeVar bound/constraints/defaults by carljm · Pull Request #20598 · astral-sh/ruff (original) (raw)
added the ty
Multi-file analysis & type inference
label
carljm added a commit that referenced this pull request
Summary
Typevar attributes (bound/constraints/default) can be either lazily evaluated or eagerly evaluated. Currently they are lazily evaluated for PEP 695 typevars, and eager for legacy and synthetic typevars. #20598 will make them lazy also for legacy typevars, and the ecosystem report on that PR surfaced the issue fixed here (because legacy typevars are much more common in the ecosystem than PEP 695 typevars.)
Applying a transform to a typevar (normalization, materialization, or mark-inferable) will reify all lazy attributes and create a new typevar with eager attributes. In terms of Salsa identity, this transformed typevar will be considered different from the original typevar, whether or not the attributes were actually transformed.
In general, this is not a problem, since all typevars in a given generic context will be transformed, or not, together.
The exception to this was implicit-self vs explicit Self annotations.
The typevar we created for implicit self was created initially using
inferable typevars, whereas an explicit Self annotation is initially
non-inferable, then transformed via mark-inferable when accessed as part
of a function signature. If the containing class (which becomes the
upper bound of Self) is generic, and has e.g. a lazily-evaluated
default, then the explicit-Self annotation will reify that default in
the upper bound, and the implicit-self would not, leading them to be
treated as different typevars, and causing us to fail to solve a call to
a method such as def method(self) -> Self correctly.
The fix here is to treat implicit-self more like explicit-Self, initially creating it as non-inferable and then using the mark-inferable transform on it. This is less efficient, but restores the invariant that all typevars in a given generic context are transformed together, or not, fixing the bug.
In the improved-constraint-solver work, the separation of typevars into "inferable" and "non-inferable" is expected to disappear, along with the mark-inferable transform, which would render both this bug and the fix moot. So this fix is really just temporary until that lands.
There is a performance regression, but not a huge one: 1-2% on most projects, 5% on one outlier. This seems acceptable, given that it should be fully recovered by removing the mark-inferable transform.
Test Plan
Added mdtests that failed before this change.
carljm marked this pull request as ready for review
Co-authored-by: Alex Waygood Alex.Waygood@Gmail.com
dcreager added a commit that referenced this pull request
As part of #20598, we added is_identical_to methods to
TypeVarInstance and BoundTypeVarInstance, which compare when two
typevar instances refer to "the same" underlying typevar, even if we
have forced their lazy bounds/constraints as part of marking typevars as
inferable. (Doing so results in a different salsa interned struct ID,
since we've changed the contents of the bounds_or_constraints field.)
It turns out that marking typevars as inferable is not the only way that we might force lazy bounds/constraints; it also happens when we materialize a type containing a typevar. This surfaced as ecosystem report failures on #20677.
That means that we need a more long-term fix to this problem.
(is_identical_to, and its underlying original field, were meant to
be a temporary fix until we removed the MarkTypeVarsInferable type
mapping.)
This PR extracts out a separate type (TypeVarIdentity) that only
includes the fields that actually inform whether two typevars are "the
same". All other properties of the typevar (default, bounds/constraints,
etc) still live in TypeVarInstance. Call sites that care about typevar
identity can now either store just TypeVarIdentity (if they never need
access to those other properties), or continue to store
TypeVarInstance but pull out its identity when performing those "are
they the same typevar" comparisons. (All of this also applies
respectively to BoundTypeVar{Identity,Instance}.) In particular,
constraint sets now work on BoundTypeVarIdentity, and generic contexts
still store a BoundTypeVarInstance (since we might need access to
defaults when specializing), but are keyed on BoundTypeVarIdentity.
KotlinIsland pushed a commit to KotlinIsland/basedpython that referenced this pull request
Summary
Typevar attributes (bound/constraints/default) can be either lazily evaluated or eagerly evaluated. Currently they are lazily evaluated for PEP 695 typevars, and eager for legacy and synthetic typevars. astral-sh/ruff#20598 will make them lazy also for legacy typevars, and the ecosystem report on that PR surfaced the issue fixed here (because legacy typevars are much more common in the ecosystem than PEP 695 typevars.)
Applying a transform to a typevar (normalization, materialization, or mark-inferable) will reify all lazy attributes and create a new typevar with eager attributes. In terms of Salsa identity, this transformed typevar will be considered different from the original typevar, whether or not the attributes were actually transformed.
In general, this is not a problem, since all typevars in a given generic context will be transformed, or not, together.
The exception to this was implicit-self vs explicit Self annotations.
The typevar we created for implicit self was created initially using
inferable typevars, whereas an explicit Self annotation is initially
non-inferable, then transformed via mark-inferable when accessed as part
of a function signature. If the containing class (which becomes the
upper bound of Self) is generic, and has e.g. a lazily-evaluated
default, then the explicit-Self annotation will reify that default in
the upper bound, and the implicit-self would not, leading them to be
treated as different typevars, and causing us to fail to solve a call to
a method such as def method(self) -> Self correctly.
The fix here is to treat implicit-self more like explicit-Self, initially creating it as non-inferable and then using the mark-inferable transform on it. This is less efficient, but restores the invariant that all typevars in a given generic context are transformed together, or not, fixing the bug.
In the improved-constraint-solver work, the separation of typevars into "inferable" and "non-inferable" is expected to disappear, along with the mark-inferable transform, which would render both this bug and the fix moot. So this fix is really just temporary until that lands.
There is a performance regression, but not a huge one: 1-2% on most projects, 5% on one outlier. This seems acceptable, given that it should be fully recovered by removing the mark-inferable transform.
Test Plan
Added mdtests that failed before this change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})