[Python-Dev] PEP 563: Postponed Evaluation of Annotations (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Mon Nov 6 00:55:12 EST 2017

Previous message (by thread): [Python-Dev] PEP 563: Postponed Evaluation of Annotations
Next message (by thread): [Python-Dev] PEP 563: Postponed Evaluation of Annotations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6 November 2017 at 14:40, Lukasz Langa <lukasz at langa.pl> wrote:

On 4 Nov, 2017, at 6:32 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

The only workaround I can see for that breakage is that instead of using strings, we could instead define a new "thunk" type that consists of two things:

1. A code object to be run with eval() 2. A dictionary mapping from variable names to closure cells (or None for not yet resolved references to globals and builtins) This is intriguing. 1. Would that only be used for type annotations? Any other interesting things we could do with them?

Yes, they'd have the potential to replace strings for at least some data analysis use cases, where passing in lambdas is too awkward syntactically, since you have to spell out all the parameters.

The pandas.DataFrame.query operation is a reasonable example of that kind of thing: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html (Not an exact example, since Pandas uses a python-like expression language, rather than specifically Python)

Right now, folks tend to use strings for this kind of use case, which has the same performance problem that pre-f-string string formatting does: it defers the expression parsing and compilation step until runtime, rather than being able to do it once and then cache the result in pycache.

2. It feels to me like that would make annotations heavier at runtime instead of leaner, since now we're forcing the relevant closures to stay in memory.

Cells are pretty cheap (they're just a couple of pointers), and if they're references to module or class attributes, the object referenced by the cell would have remained alive regardless.

Even for nonlocal variable references (which a solely string-based approach would disallow), the referenced objects will already be getting kept alive anyway by way of the typing machinery.

3. This form of lazy evaluation seems pretty implicit to me for the reader. Peter Ludemann's example of magic logging.debug() is a case in point here.

One of the biggest advantages though is that just like functions, all of the necessary logic for doing the delayed evaluation can be captured in a call method, rather than via elaborate instructions on how to appropriately invoke eval() based on knowledge of where the annotation came from.

This is especially important if typing gets taken out of the standard library: you'll need a replacement for typing.get_type_hints() in PEP 563, and a thunk.call() method would be a good spelling for that.

All in all, unless somebody else is ready to step up and write the PEP on this subject (and its implementation) right now, I think this idea will miss Python 3.7.

As long as we don't argue for that being an adequate excuse to rush into "We're using plain strings with ill-defined name resolution semantics because we couldn't be bothered coming up with a proper thunk-based design to evaluate", I'd be fine with that. None of this is urgent, and it's mainly of interest to large organisations that will see a direct economic benefit from implementing it, so the entire idea can easily be delayed to 3.8 if they're not prepared to fund a proper evaluation of the available design options over the next 3 months.

Python's name resolution rules are already ridiculously complicated, and PEP 563 is proposing to make them even worse, purely for the sake of an optional feature primarily of interest to large enterprise users. If delayed evaluation of type annotations is deemed important enough to burden every future Pythonista with learning a second set of name resolution semantics purely for type annotations, then it's important enough to postpone implementing it until someone invests the time in coming up with a competing thunk-based alternative that is able to rely entirely on the existing name resolution semantics.

Exploring that potential thunk-based approach a bit further:

We'd continue to eagerly compile annotations (as we do today), but treat them like a nested class body with a single expression. Unlike an implicit lambda, this compilation mode will allow the resulting code object to be used with the two-argument form of the exec builtin
That code object would be the main item stored on the thunk object
If classcell is defined in the current namespace and names from the current namespace are referenced, then that can be captured on the thunk, giving its call method access to any class attributes needed for name resolution
Closure references would be captured automatically, but class bodies already allow locals to override nonlocals (for compatibility with pre-populated namespaces returned from prepare)
A thunk's globals reference would be implicitly captured the same way it is for a regular function

That's enough to leave nested classes as the main problematic case, since they can't see each other's attributes by default, and the proposed injected locals semantics in PEP 563 don't get this right either (they only account for MRO-based name resolution, not lexical nesting, even though the PEP claims the latter is expected to work)

To illustrate the problem:

>>> _class C:_
...     field = 1
...     class D:
...         field2 = field
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in C
  File "<stdin>", line 4, in D
NameError: name 'field' is not defined

The class ref used for zero-arg super support doesn't currently solve this problem, as right now, it only extends a single level - the inner class definition hides the outer one from method implementations (and deliberately so).

There are two main ways of resolving this, with the simplest being to say that type annotations still need to be resolvable using normal closure semantics. That is, the nested class example in the PEP would be changed as follows:

# C is defined at module or function scope, not inside another class
class C:
    field = 'c_field'

    def method(self, arg: field) -> None:  # this is OK
        ...

    def method2(self, arg: C.field) -> None:  # this is OK
        ...

    class D:
        field2 = 'd_field'
        def method(self, arg: C.field) -> C.D.field2:  # this is OK
            ...

        def method2(self, arg: C.field) -> field2:  # this is OK
            ...

        def method3(self, arg: field) -> field2:  # this fails (can't
find 'field')
            ...

        def method4(self, arg: C.field) -> D.field2:  # this fails
(can't find 'D')
            ...

This means the compiler needs to be involved at least enough to capture references to classes that aren't defined at the top level of a module.

If you don't use existing closure semantics to solve it, then you'd instead need to either update the compiler to capture a stack of class references, or else reverse engineer something based on qualname. However, the latter approach wouldn't work for classes defined inside a function (since there's no navigation path from the module namespace back down to the individual classes - you need a cell reference in that case).

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message (by thread): [Python-Dev] PEP 563: Postponed Evaluation of Annotations
Next message (by thread): [Python-Dev] PEP 563: Postponed Evaluation of Annotations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list