Issue 47006: PEP 646: Decide on substitution behavior (original) (raw)

Created on 2022-03-13 20:46 by JelleZijlstra, last changed 2022-04-11 14:59 by admin.

Messages (17)

msg415100 - (view)

Author: Jelle Zijlstra (JelleZijlstra) * (Python committer)

Date: 2022-03-13 20:46

We've had some disagreement about the behavior of TypeVarTuple substitution related to PEP 646, and the discussion has now spilled around multiple PRs. I'd like to use this issue to come to an agreement so we don't have to chase through so many different places.

Links:

I'd like to ask that until we come to an agreement we hold off on making any more changes, so we don't have to go back and forth and we ensure that the eventual solution covers all edge cases.

The disagreement is about what to do with TypeVarTuple substitution: the behavior when a generic type is subscripted, like tuple[*Ts][int, str].

There are two possible extreme approaches:

msg415107 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-03-13 21:59

Thanks for starting this, Jelle - I was a bit unsure about how to proceed here.

Given that https://github.com/python/cpython/pull/31800 is already merged, I'd also propose something halfway between the two extremes: return a sensible substitution when the logic to compute that isn't too onerous, and a new GenericAlias object when it is. The upsides are that we'd probably be able to return reasonable substitutions for the vast majority of cases, and that we wouldn't have to remove what's already been merged. The downsides would be lack of consistency, and the potential for changing rules about what does and doesn't return a full substitution as time goes on and new features are added.

msg415108 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-03-13 22:01

(Having said that, to be clear: my preferred solution currently would still be the solution where we just return a new GenericAlias for anything involving a TypeVarTuple. The crux is what Serhiy is happy with.)

msg415109 - (view)

Author: Jelle Zijlstra (JelleZijlstra) * (Python committer)

Date: 2022-03-13 22:10

Thanks Matthew! Merged PRs can still be reverted, and we have some time before the feature freeze. I'd like to hear what Guido and Ken think too.

If we go with the GenericAlias substitution, we need to make sure that such aliases still work as base class. That would need some C work to make types.GenericAlias.mro_entries recurse if the alias's origin is itself a GenericAlias. There's a few other subtleties to think about; I can work on that but don't have a ton of time today.

msg415557 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2022-03-19 15:32

I am for consistent behavior. If return GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str)) for tuple[*Ts][int, str], we should also return GenericAlias(GenericAlias(list, T), int) for list[T][int], etc. And it will cause multiple problems:

It may be that will need to use it as a fallback for cases like tuple[T, *Ts][*Ts2] (currently it is error). But I am not sure that such cases should be supported.

msg415623 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2022-03-20 16:44

I think I'm with Serhiy, I don't understand the hesitance to transform tuple[*Ts][int, str] into tuple[int, str].

What would be an example of a substitution that's too complex to do?

msg415637 - (view)

Author: Jelle Zijlstra (JelleZijlstra) * (Python committer)

Date: 2022-03-20 21:54

It's simple if you only look at simple examples.

Here are some examples current main (with Serhiy's patch for the Python version of typing) gets wrong:

from typing import * Ts = TypeVarTuple("Ts") T1 = TypeVar("T1") T2 = TypeVar("T2") Tuple[T1, Unpack[Ts], T2][int, Unpack[tuple[int]]] # expect error typing.Tuple[int, *tuple[int]] Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts]] # expect error (T2 missing) typing.Tuple[int, str, *Ts] # it put *Ts in the wrong place Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts], Unpack[Ts]] # expect error (*Ts can't substitute T2) typing.Tuple[int, *Ts, str, *Ts] class G(Generic[T1, Unpack[Ts], T2]): pass ... G[int] # expect error main.G[int]

We can probably fix that, but I'm not looking forward to implementing the fixed logic in both Python and C. Also, I'm worried that it won't work with future extensions to the type system (e.g., the rumored Map operator) that may go into 3.12 or later versions.

msg415694 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2022-03-21 18:17

The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

Two other cases will be fixed by GH 32031. It does not require any C code.

In the last case no error is raised because some error checks are skipped if any of Generic arguments is a TypeVarTuple. We just need to add such checks. This is Python-only code too.

Note that the alternative proposition is even more lenient to errors.

msg415710 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-03-21 21:33

[Guido]

What would be an example of a substitution that's too complex to do?

We also need to remember the dreaded arbitrary-length tuple. For example, I think it should be the case that:

T = TypeVar('T')
Ts = TypeVarTuple('Ts')
class C(Generic[*Ts]): pass
Alias = C[T, *Ts]
Alias2 = Alias[*tuple[int, ...]]
# Alias2 should be C[int, *tuple[int, ...]]

Ok, this is a bit of a silly example, but if we're committing to evaluating substitutions correctly, we should probably make even this kind of example behave correctly so that users who accidentally do something silly can debug what's gone wrong.

[Serhiy]

A repr can be less readable.

Definitely true.

It will break equality comparison and hashing. Good bye caching.

Huh, I didn't know about this one. Fair enough, this is totally a downside.

What about origin, parameters, args? How will they be calculated?

This could admittedly be thorny. We'd have to think it through carefully. Admittedly also a downside.

It can break code which uses annotations for something. For example it can break dataclasses.

Oh, also interesting - I didn't know about this one either. Could you give an example?

The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

We actually deliberately chose not to unpack concrete tuple types - see the description of https://github.com/python/cpython/pull/30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

Two other cases will be fixed by GH 32031. It does not require any C code.

I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?) seems like too restrictive a solution. I can imagine there might be completely legitimate cases where the ability to do this would be important. For example:

DType = TypeVar('DType')
Shape = TypeVarTuple('Shape')
class Tensor(Generic[DType, *Shape]): ...
Uint8Tensor = Tensor[uint8, *Shape]
Unit8BatchTensor = Uint8Tensor[Batch, *Shape]

Note that the alternative proposition is even more lenient to errors.

True, but at least it's predictably lenient to errors - I think the repr makes it very clear that "Woah, you're doing something advanced here. You're on your own!" I think it better fits the principle of least astonishment to have something that consistently lets through all errors of a certain class than something that sometimes catches errors and sometimes doesn't.

msg415712 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-03-21 21:35

P.s. To be clear, (I think?) these are all substitutions that are computable. We could implement the logic to make all these evaluate correctly if we wanted to. It's just a matter of how much complexity we want to allow in typing.py (or in the runtime in general, if we say farmed some of this logic out to a separate module).

msg415734 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2022-03-22 01:01

I'd like to look at this as a case of simplifying something to its simplest canonical form, but no simpler. This is what the existing fixed-typevar expansion does: e.g. tuple[str, T, T][int] becomes tuple[str, int, int].

I propose that we try to agree on a set of rules for what can be simplified further and what cannot, when we have B = C[...]; A = B[...], (IOW A = C[...][...]), for various shapes of the subscripts to C and B. Note that what's relevant for the second subscript is C[...].parameters, so I'll call that "left" below.

  1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

  2. Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

  3. If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

  4. If there's a *Ts on the right but not on the left, we should not simplify, since whatever we have on the left serves as a constraint for *Ts. (E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).)

  5. If there's exactly one *Ts on the left and one on the right, we might_ be able to simplify if the prefix and suffix of the parameters match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that parameters in this example is (T, Ts); we have to assume that typevartuples in parameters are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

TBH case 5 is the most complex and I may have overlooked something. I'm more sure of cases 1-4.

msg415752 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2022-03-22 08:49

Alias = C[T, *Ts] Alias2 = Alias[*tuple[int, ...]]

Alias2 should be C[int, *tuple[int, ...]]

tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

Oh, also interesting - I didn't know about this one either. Could you give an example?

If origin, parameters, args are a mess, it will definitely break a code which use them.

We actually deliberately chose not to unpack concrete tuple types - see the description of https://github.com/python/cpython/pull/30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.annotations['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

And one of PEP 646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

I propose to implement simple and strict rules, and later add support of new cases where it makes sense.

msg415753 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2022-03-22 09:06

  1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

I do not understand this. Do you forbid simplifying of tuple[*Ts, float][str, *tuple[int, ...]] to tuple[str, *tuple[int, ...], float]?

I think that the rule should be that *tuple[X, ...] cannot split between different variables. Or that it cannot substitute a TypeVar. A more strong variant of rule 4.

  1. ... but we cannot flag it as an error either.

I think that it will better to flag it as an error now. Later, after all code be merged and all edge cases be handled we can return here and reconsider this.

There are workarounds for this.

These tricks are common in functional programming.

The rest of the rules match my implementations more or less.

msg416707 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-04-04 23:28

Apologies for the slow reply - coming back to this now that the docs and pickling issues are mostly sorted.

[Serhiy]

Alias = C[T, *Ts] Alias2 = Alias[*tuple[int, ...]]

Alias2 should be C[int, *tuple[int, ...]]

tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

This was my initial intuition too, but Pradeep pointed out to me in https://github.com/python/cpython/pull/31021#discussion_r815853784 that for tuple[int, ...], Python has chosen the opposite mindset: instead of assuming the worst-case scenario, we assume the best-case scenario. Thus, the following type-checks correctly with mypy (https://mypy-play.net/?mypy=latest&python=3.10&gist=b9ca66fb7d172f939951a741388836a6):

def return_first(tup: tuple[int, ...]) -> int: return tup[0] tup: tuple[()] = () return_first(tup)

We actually deliberately chose not to unpack concrete tuple types - see the description of https://github.com/python/cpython/pull/30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.annotations['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

Fair point, we could technically distinguish between tuple[str, bool] and (str, bool). But if I was a naive user and I saw foo.__annotations__['args'] == (str, bool), I don't think it'd be immediately obvious to me that the type of args was *tuple[str, bool].

Also though, there's a second reason mentioned in https://github.com/python/cpython/pull/30398 why (str, bool) wouldn't be the best choice. We decided that the runtime behaviour of *args: *something should be that we essentially do (*something,)[0]. If we made tuple[int, str] unpack to (int, str), then we'd end up with __annotations__['args'] == (int,).

And one of PEP 646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

As in, we would allow Generic[*Ts], but not *args: *Ts? That'd be a major change to the PEP - not an option I'm willing to consider at this stage in the process.

I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

Ah, gotcha. My mistake.

[Guido]

I ran out of time this evening :) Will reply properly soon.

msg416795 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-04-05 18:21

[Guido]

  1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify.

Alright, let me think this through with some examples to get my head round it.

It would prohibit the following difficult case:

class C(Generic[*Ts]): ... Alias = C[T, *Ts] Alias[*tuple[int, ...]] # Does not simplify; stays C[T, *Ts][*tuple[int, ...]]

That seems pretty reasonable. It would also prohibit these other relatively simple cases, but I guess that's fine:

Alias = C[*Ts] Alias[*tuple[int, ...]] # Does not simplify; stays C[*Ts][*tuple[int, ...]]

Alias = C[T, *tuple[int, ...]] Alias[str] # Does not simplify; stays C[T, *tuple[int, ...]][str]

Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

Is this to say that we effectively prohibit binding *tuple[...] to anything? If we can simplify without binding *tuple[...] to anything, then we do simplify, but otherwise, we don't simplify? So under this rule, the following WOULD work?

Alias = C[T, *tuple[int, ...]] Alias[str] # Simplifies to C[str, *tuple[int, ...]], because we didn't have to bind *tuple[int, ...] to do it

  1. Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

Alright, so this is business as usual.

  1. If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

So then:

class C(Generic[*Ts]): ... Alias = C[T, *Ts] Alias[()] # Raises error Alias[int] # Simplifies to C[int, *Ts] Alias[int, str] # Simplifies to C[int, str] Alias[int, str, bool] # Simplifies to C[int, str, bool]

Yup, seems straightforward.

  1. If there's a *Ts on the right but not on the left, we should not simplify, since whatever we have on the left serves as a constraint for *Ts.

Ok, so this is about the following situations:

class C(Generic[*Ts]): ... Alias = C[T1, T2] Alias[*Ts] # Does not simplify; stays C[T1, T2][*Ts]

Yikes - in fact, this is actually super hairy; I hadn't thought about this edge case at all in the PEP.

Agreed that it seems reasonable not to simplify here.

E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).

Was that a typo? Surely tuple[int, int][*Ts] isn't valid - since tuple[int, int] doesn't have any free parameters?

  1. If there's exactly one *Ts on the left and one on the right, we might_ be able to simplify if the prefix and suffix of the parameters match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that parameters in this example is (T, Ts); we have to assume that typevartuples in parameters are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

Ok, this also makes sense.


Still, though, doesn't the point that Serhiy brought up about origin, parameters and args still apply? In cases where we don't simplify, there'd still be the issue of what we'd set these things to be.

This evening I'll also revisit the PRs adding tests for substitution to try and make them a comprehensive reference as to what's currently possible.

msg416813 - (view)

Author: Matthew Rahtz (matthew.rahtz) *

Date: 2022-04-05 20:03

Ok, https://github.com/python/cpython/pull/32341/files is a reference of how the current implementation behaves. Fwiw, it is mostly correct - with a few minor tweaks it might be alright for at least the 3.11 release.

In particular, instead of dealing with the thorny issue of what to do about splitting unpacked arbitrary-length tuples over multiple type variables - e.g. C[T, *Ts][*tuple[int, ...]] - instead either deciding to try and evaluate it properly and living with the complexity, or leaving it unsimplified and living with the args, parameters and origin problem - for now, we could just raise an exception for any substitutions which involve an unpacked arbitrary-length tuple, since I'd guess it's going to be an extremely rare use-case.

msg416913 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2022-04-07 01:50

We need to move on this, because the outcome of this discussion is a release blocker for 3.11b1 -- the next release!

History

Date

User

Action

Args

2022-04-11 14:59:57

admin

set

github: 91162

2022-04-07 01:50:13

gvanrossum

set

priority: normal -> release blocker
type: behavior
messages: +

2022-04-05 20:03:02

matthew.rahtz

set

messages: +

2022-04-05 19:56:58

matthew.rahtz

set

keywords: + patch
stage: patch review
pull_requests: + <pull%5Frequest30396>

2022-04-05 18:21:12

matthew.rahtz

set

messages: +

2022-04-04 23:28:09

matthew.rahtz

set

messages: +

2022-03-22 09:06:58

serhiy.storchaka

set

messages: +

2022-03-22 08:49:01

serhiy.storchaka

set

messages: +

2022-03-22 01:01:21

gvanrossum

set

messages: +

2022-03-21 21:35:34

matthew.rahtz

set

messages: +

2022-03-21 21:33:50

matthew.rahtz

set

messages: +

2022-03-21 18:17:54

serhiy.storchaka

set

messages: +

2022-03-20 21:54:06

JelleZijlstra

set

messages: +

2022-03-20 16:44:40

gvanrossum

set

messages: +

2022-03-19 15:32:54

serhiy.storchaka

set

messages: +

2022-03-13 22:57:37

AlexWaygood

set

nosy: + AlexWaygood

2022-03-13 22:10:02

JelleZijlstra

set

messages: +

2022-03-13 22:01:18

matthew.rahtz

set

messages: +

2022-03-13 21:59:28

matthew.rahtz

set

nosy: + matthew.rahtz
messages: +

2022-03-13 20:46:48

JelleZijlstra

set

nosy: + mrahtz

2022-03-13 20:46:16

JelleZijlstra

create