[ty] defer inference of legacy TypeVar bound/constraints/defaults by carljm · Pull Request #20598 · astral-sh/ruff (original) (raw)

@carljm added the ty

Multi-file analysis & type inference

label

Sep 27, 2025

carljm added a commit that referenced this pull request

Oct 8, 2025

@carljm

Summary

Typevar attributes (bound/constraints/default) can be either lazily evaluated or eagerly evaluated. Currently they are lazily evaluated for PEP 695 typevars, and eager for legacy and synthetic typevars. #20598 will make them lazy also for legacy typevars, and the ecosystem report on that PR surfaced the issue fixed here (because legacy typevars are much more common in the ecosystem than PEP 695 typevars.)

Applying a transform to a typevar (normalization, materialization, or mark-inferable) will reify all lazy attributes and create a new typevar with eager attributes. In terms of Salsa identity, this transformed typevar will be considered different from the original typevar, whether or not the attributes were actually transformed.

In general, this is not a problem, since all typevars in a given generic context will be transformed, or not, together.

The exception to this was implicit-self vs explicit Self annotations. The typevar we created for implicit self was created initially using inferable typevars, whereas an explicit Self annotation is initially non-inferable, then transformed via mark-inferable when accessed as part of a function signature. If the containing class (which becomes the upper bound of Self) is generic, and has e.g. a lazily-evaluated default, then the explicit-Self annotation will reify that default in the upper bound, and the implicit-self would not, leading them to be treated as different typevars, and causing us to fail to solve a call to a method such as def method(self) -> Self correctly.

The fix here is to treat implicit-self more like explicit-Self, initially creating it as non-inferable and then using the mark-inferable transform on it. This is less efficient, but restores the invariant that all typevars in a given generic context are transformed together, or not, fixing the bug.

In the improved-constraint-solver work, the separation of typevars into "inferable" and "non-inferable" is expected to disappear, along with the mark-inferable transform, which would render both this bug and the fix moot. So this fix is really just temporary until that lands.

There is a performance regression, but not a huge one: 1-2% on most projects, 5% on one outlier. This seems acceptable, given that it should be fully recovered by removing the mark-inferable transform.

Test Plan

Added mdtests that failed before this change.

@carljm carljm marked this pull request as ready for review

October 9, 2025 00:02

MichaReiser

AlexWaygood

AlexWaygood

AlexWaygood

carljm

AlexWaygood

@carljm

@carljm

@carljm @AlexWaygood

Co-authored-by: Alex Waygood Alex.Waygood@Gmail.com

@carljm

@carljm

@carljm

@carljm

@carljm

@carljm

dcreager added a commit that referenced this pull request

Oct 14, 2025

@dcreager

As part of #20598, we added is_identical_to methods to TypeVarInstance and BoundTypeVarInstance, which compare when two typevar instances refer to "the same" underlying typevar, even if we have forced their lazy bounds/constraints as part of marking typevars as inferable. (Doing so results in a different salsa interned struct ID, since we've changed the contents of the bounds_or_constraints field.)

It turns out that marking typevars as inferable is not the only way that we might force lazy bounds/constraints; it also happens when we materialize a type containing a typevar. This surfaced as ecosystem report failures on #20677.

That means that we need a more long-term fix to this problem. (is_identical_to, and its underlying original field, were meant to be a temporary fix until we removed the MarkTypeVarsInferable type mapping.)

This PR extracts out a separate type (TypeVarIdentity) that only includes the fields that actually inform whether two typevars are "the same". All other properties of the typevar (default, bounds/constraints, etc) still live in TypeVarInstance. Call sites that care about typevar identity can now either store just TypeVarIdentity (if they never need access to those other properties), or continue to store TypeVarInstance but pull out its identity when performing those "are they the same typevar" comparisons. (All of this also applies respectively to BoundTypeVar{Identity,Instance}.) In particular, constraint sets now work on BoundTypeVarIdentity, and generic contexts still store a BoundTypeVarInstance (since we might need access to defaults when specializing), but are keyed on BoundTypeVarIdentity.

KotlinIsland pushed a commit to KotlinIsland/basedpython that referenced this pull request

May 1, 2026

@carljm

Summary

Typevar attributes (bound/constraints/default) can be either lazily evaluated or eagerly evaluated. Currently they are lazily evaluated for PEP 695 typevars, and eager for legacy and synthetic typevars. astral-sh/ruff#20598 will make them lazy also for legacy typevars, and the ecosystem report on that PR surfaced the issue fixed here (because legacy typevars are much more common in the ecosystem than PEP 695 typevars.)

Applying a transform to a typevar (normalization, materialization, or mark-inferable) will reify all lazy attributes and create a new typevar with eager attributes. In terms of Salsa identity, this transformed typevar will be considered different from the original typevar, whether or not the attributes were actually transformed.

In general, this is not a problem, since all typevars in a given generic context will be transformed, or not, together.

The exception to this was implicit-self vs explicit Self annotations. The typevar we created for implicit self was created initially using inferable typevars, whereas an explicit Self annotation is initially non-inferable, then transformed via mark-inferable when accessed as part of a function signature. If the containing class (which becomes the upper bound of Self) is generic, and has e.g. a lazily-evaluated default, then the explicit-Self annotation will reify that default in the upper bound, and the implicit-self would not, leading them to be treated as different typevars, and causing us to fail to solve a call to a method such as def method(self) -> Self correctly.

The fix here is to treat implicit-self more like explicit-Self, initially creating it as non-inferable and then using the mark-inferable transform on it. This is less efficient, but restores the invariant that all typevars in a given generic context are transformed together, or not, fixing the bug.

In the improved-constraint-solver work, the separation of typevars into "inferable" and "non-inferable" is expected to disappear, along with the mark-inferable transform, which would render both this bug and the fix moot. So this fix is really just temporary until that lands.

There is a performance regression, but not a huge one: 1-2% on most projects, 5% on one outlier. This seems acceptable, given that it should be fully recovered by removing the mark-inferable transform.

Test Plan

Added mdtests that failed before this change.

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})