[Python-Dev] PEP 572: Assignment Expressions (original) (raw)

Chris Angelico rosuav at gmail.com
Tue Apr 17 03:46:20 EDT 2018

Previous message (by thread): [Python-Dev] Basic test to validate win10 install?
Next message (by thread): [Python-Dev] PEP 572: Assignment Expressions
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Having survived four rounds in the boxing ring at python-ideas, PEP 572 is now ready to enter the arena of python-dev. I'll let the proposal speak for itself. Be aware that the reference implementation currently has a few test failures, which I'm still working on, but to my knowledge nothing will prevent the proposal itself from being successfully implemented.

For those who have seen the most recent iteration on -ideas, the only actual change to the core proposal is that chaining is fully supported now.

Formatted version: https://www.python.org/dev/peps/pep-0572/

ChrisA

PEP: 572 Title: Assignment Expressions Author: Chris Angelico <rosuav at gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Feb-2018 Python-Version: 3.8 Post-History: 28-Feb-2018, 02-Mar-2018, 23-Mar-2018, 04-Apr-2018, 17-Apr-2018

Abstract

This is a proposal for creating a way to assign to names within an expression. Additionally, the precise scope of comprehensions is adjusted, to maintain consistency and follow expectations.

Rationale

Naming the result of an expression is an important part of programming, allowing a descriptive name to be used in place of a longer expression, and permitting reuse. Currently, this feature is available only in statement form, making it unavailable in list comprehensions and other expression contexts. Merely introducing a way to assign as an expression would create bizarre edge cases around comprehensions, though, and to avoid the worst of the confusions, we change the definition of comprehensions, causing some edge cases to be interpreted differently, but maintaining the existing behaviour in the majority of situations.

Syntax and semantics

In any context where arbitrary Python expressions can be used, a named expression can appear. This is of the form target := expr where expr is any valid Python expression, and target is any valid assignment target.

The value of such a named expression is the same as the incorporated expression, with the additional side-effect that the target is assigned that value::

# Handle a matched regex
if (match := pattern.search(data)) is not None:
    ...

# A more explicit alternative to the 2-arg form of iter() invocation
while (value := read_next_item()) is not None:
    ...

# Share a subexpression between a comprehension filter clause and its output
filtered_data = [y for x in data if (y := f(x)) is not None]

Differences from regular assignment statements

Most importantly, since := is an expression, it can be used in contexts where statements are illegal, including lambda functions and comprehensions.

An assignment statement can assign to multiple targets, left-to-right::

x = y = z = 0

The equivalent assignment expression is parsed as separate binary operators, and is therefore processed right-to-left, as if it were spelled thus::

assert 0 == (x := (y := (z := 0)))

Augmented assignment is not supported in expression form::

>>> x +:= 1
  File "<stdin>", line 1
    x +:= 1
        ^
SyntaxError: invalid syntax

Otherwise, the semantics of assignment are identical in statement and expression forms.

Alterations to comprehensions

The current behaviour of list/set/dict comprehensions and generator expressions has some edge cases that would behave strangely if an assignment expression were to be used. Therefore the proposed semantics are changed, removing the current edge cases, and instead altering their behaviour only in a class scope.

As of Python 3.7, the outermost iterable of any comprehension is evaluated in the surrounding context, and then passed as an argument to the implicit function that evaluates the comprehension.

Under this proposal, the entire body of the comprehension is evaluated in its implicit function. Names not assigned to within the comprehension are located in the surrounding scopes, as with normal lookups. As one special case, a comprehension at class scope will eagerly bind any name which is already defined in the class scope.

A list comprehension can be unrolled into an equivalent function. With Python 3.7 semantics::

numbers = [x + y for x in range(3) for y in range(4)]
# Is approximately equivalent to
def <listcomp>(iterator):
    result = []
    for x in iterator:
        for y in range(4):
            result.append(x + y)
    return result
numbers = <listcomp>(iter(range(3)))

Under the new semantics, this would instead be equivalent to::

def <listcomp>():
    result = []
    for x in range(3):
        for y in range(4):
            result.append(x + y)
    return result
numbers = <listcomp>()

When a class scope is involved, a naive transformation into a function would prevent name lookups (as the function would behave like a method)::

class X:
    names = ["Fred", "Barney", "Joe"]
    prefix = "> "
    prefixed_names = [prefix + name for name in names]

With Python 3.7 semantics, this will evaluate the outermost iterable at class scope, which will succeed; but it will evaluate everything else in a function::

class X:
    names = ["Fred", "Barney", "Joe"]
    prefix = "> "
    def <listcomp>(iterator):
        result = []
        for name in iterator:
            result.append(prefix + name)
        return result
    prefixed_names = <listcomp>(iter(names))

The name prefix is thus searched for at global scope, ignoring the class name. Under the proposed semantics, this name will be eagerly bound; and the same early binding then handles the outermost iterable as well. The list comprehension is thus approximately equivalent to::

class X:
    names = ["Fred", "Barney", "Joe"]
    prefix = "> "
    def <listcomp>(names=names, prefix=prefix):
        result = []
        for name in names:
            result.append(prefix + name)
        return result
    prefixed_names = <listcomp>()

With list comprehensions, this is unlikely to cause any confusion. With generator expressions, this has the potential to affect behaviour, as the eager binding means that the name could be rebound between the creation of the genexp and the first call to next(). It is, however, more closely aligned to normal expectations. The effect is ONLY seen with names that are looked up from class scope; global names (eg range()) will still be late-bound as usual.

One consequence of this change is that certain bugs in genexps will not be detected until the first call to next(), where today they would be caught upon creation of the generator. See 'open questions' below.

Recommended use-cases

Simplifying list comprehensions

These list comprehensions are all approximately equivalent::

stuff = [[y := f(x), x/y] for x in range(5)]

# There are a number of less obvious ways to spell this in current
# versions of Python.

# External helper function
def pair(x, value): return [value, x/value]
stuff = [pair(x, f(x)) for x in range(5)]

# Inline helper function
stuff = [(lambda y: [y,x/y])(f(x)) for x in range(5)]

# Extra 'for' loop - potentially could be optimized internally
stuff = [[y, x/y] for x in range(5) for y in [f(x)]]

# Iterating over a genexp
stuff = [[y, x/y] for x, y in ((x, f(x)) for x in range(5))]

# Expanding the comprehension into a loop
stuff = []
for x in range(5):
    y = f(x)
    stuff.append([y, x/y])

# Wrapping the loop in a generator function
def g():
    for x in range(5):
        y = f(x)
        yield [y, x/y]
stuff = list(g())

# Using a mutable cache object (various forms possible)
c = {}
stuff = [[c.update(y=f(x)) or c['y'], x/c['y']] for x in range(5)]

If calling f(x) is expensive or has side effects, the clean operation of the list comprehension gets muddled. Using a short-duration name binding retains the simplicity; while the extra for loop does achieve this, it does so at the cost of dividing the expression visually, putting the named part at the end of the comprehension instead of the beginning.

Similarly, a list comprehension can map and filter efficiently by capturing the condition::

results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]

Capturing condition values

Assignment expressions can be used to good effect in the header of an if or while statement::

# Proposed syntax
while (command := input("> ")) != "quit":
    print("You entered:", command)

# Capturing regular expression match objects
# See, for instance, Lib/pydoc.py, which uses a multiline spelling
# of this effect
if match := re.search(pat, text):
    print("Found:", match.group(0))

# Reading socket data until an empty string is returned
while data := sock.read():
    print("Received data:", data)

# Equivalent in current Python, not caring about function return value
while input("> ") != "quit":
    print("You entered a command.")

# To capture the return value in current Python demands a four-line
# loop header.
while True:
    command = input("> ");
    if command == "quit":
        break
    print("You entered:", command)

Particularly with the while loop, this can remove the need to have an infinite loop, an assignment, and a condition. It also creates a smooth parallel between a loop which simply uses a function call as its condition, and one which uses that as its condition but also uses the actual value.

Rejected alternative proposals

Proposals broadly similar to this one have come up frequently on python-ideas. Below are a number of alternative syntaxes, some of them specific to comprehensions, which have been rejected in favour of the one given above.

Alternative spellings

Broadly the same semantics as the current proposal, but spelled differently.

EXPR as NAME, with or without parentheses::
```
stuff = [[f(x) as y, x/y] for x in range(5)]
```
Omitting the parentheses in this form of the proposal introduces many syntactic ambiguities. Requiring them in all contexts leaves open the option to make them optional in specific situations where the syntax is unambiguous (cf generator expressions as sole parameters in function calls), but there is no plausible way to make them optional everywhere.

With the parentheses, this becomes a viable option, with its own tradeoffs in syntactic ambiguity. Since EXPR as NAME already has meaning in except and with statements (with different semantics), this would create unnecessary confusion or require special-casing (most notably of with and except statements, where a nearly-identical syntax has different semantics).
EXPR -> NAME::
```
stuff = [[f(x) -> y, x/y] for x in range(5)]
```
This syntax is inspired by languages such as R and Haskell, and some programmable calculators. (Note that a left-facing arrow y <- f(x) is not possible in Python, as it would be interpreted as less-than and unary minus.) This syntax has a slight advantage over 'as' in that it does not conflict with with and except statements, but otherwise is equivalent.
Adorning statement-local names with a leading dot::
```
stuff = [[(f(x) as .y), x/.y] for x in range(5)] # with "as"
stuff = [[(.y := f(x)), x/.y] for x in range(5)] # with ":="
```
This has the advantage that leaked usage can be readily detected, removing some forms of syntactic ambiguity. However, this would be the only place in Python where a variable's scope is encoded into its name, making refactoring harder. This syntax is quite viable, and could be promoted to become the current recommendation if its advantages are found to outweigh its cost.
Adding a where: to any statement to create local name bindings::
```
value = x**2 + 2*x where:
    x = spam(1, 4, 7, q)
```
Execution order is inverted (the indented body is performed first, followed by the "header"). This requires a new keyword, unless an existing keyword is repurposed (most likely with:). See PEP 3150 for prior discussion on this subject (with the proposed keyword being given:).
TARGET from EXPR::
```
stuff = [[y from f(x), x/y] for x in range(5)]
```
This syntax has fewer conflicts than as does (conflicting only with the raise Exc from Exc notation), but is otherwise comparable to it. Instead of paralleling with expr as target: (which can be useful but can also be confusing), this has no parallels, but is evocative.

Special-casing conditional statements

One of the most popular use-cases is if and while statements. Instead of a more general solution, this proposal enhances the syntax of these two statements to add a means of capturing the compared value::

if re.search(pat, text) as match:
    print("Found:", match.group(0))

This works beautifully if and ONLY if the desired condition is based on the truthiness of the captured value. It is thus effective for specific use-cases (regex matches, socket reads that return '' when done), and completely useless in more complicated cases (eg where the condition is f(x) < 0 and you want to capture the value of f(x)). It also has no benefit to list comprehensions.

Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction of possible use-cases, even in if/while statements.

Special-casing comprehensions

Another common use-case is comprehensions (list/set/dict, and genexps). As above, proposals have been made for comprehension-specific solutions.

where, let, or given::
```
stuff = [(y, x/y) where y = f(x) for x in range(5)]
stuff = [(y, x/y) let y = f(x) for x in range(5)]
stuff = [(y, x/y) given y = f(x) for x in range(5)]
```
This brings the subexpression to a location in between the 'for' loop and the expression. It introduces an additional language keyword, which creates conflicts. Of the three, where reads the most cleanly, but also has the greatest potential for conflict (eg SQLAlchemy and numpy have where methods, as does tkinter.dnd.Icon in the standard library).
with NAME = EXPR::
```
stuff = [(y, x/y) with y = f(x) for x in range(5)]
```
As above, but reusing the with keyword. Doesn't read too badly, and needs no additional language keyword. Is restricted to comprehensions, though, and cannot as easily be transformed into "longhand" for-loop syntax. Has the C problem that an equals sign in an expression can now create a name binding, rather than performing a comparison. Would raise the question of why "with NAME = EXPR:" cannot be used as a statement on its own.
with EXPR as NAME::
```
stuff = [(y, x/y) with f(x) as y for x in range(5)]
```
As per option 2, but using as rather than an equals sign. Aligns syntactically with other uses of as for name binding, but a simple transformation to for-loop longhand would create drastically different semantics; the meaning of with inside a comprehension would be completely different from the meaning as a stand-alone statement, while retaining identical syntax.

Regardless of the spelling chosen, this introduces a stark difference between comprehensions and the equivalent unrolled long-hand form of the loop. It is no longer possible to unwrap the loop into statement form without reworking any name bindings. The only keyword that can be repurposed to this task is with, thus giving it sneakily different semantics in a comprehension than in a statement; alternatively, a new keyword is needed, with all the costs therein.

Lowering operator precedence

There are two logical precedences for the := operator. Either it should bind as loosely as possible, as does statement-assignment; or it should bind more tightly than comparison operators. Placing its precedence between the comparison and arithmetic operators (to be precise: just lower than bitwise OR) allows most uses inside while and if conditions to be spelled without parentheses, as it is most likely that you wish to capture the value of something, then perform a comparison on it::

pos = -1
while pos := buffer.find(search_term, pos + 1) >= 0:
    ...

Once find() returns -1, the loop terminates. If := binds as loosely as = does, this would capture the result of the comparison (generally either True or False), which is less useful.

While this behaviour would be convenient in many situations, it is also harder to explain than "the := operator behaves just like the assignment statement", and as such, the precedence for := has been made as close as possible to that of =.

Migration path

The semantic changes to list/set/dict comprehensions, and more so to generator expressions, may potentially require migration of code. In many cases, the changes simply make legal what used to raise an exception, but there are some edge cases that were previously legal and now are not, and a few corner cases with altered semantics.

Yield inside comprehensions

As of Python 3.7, the outermost iterable in a comprehension is permitted to contain a 'yield' expression. If this is required, the iterable (or at least the yield) must be explicitly elevated from the comprehension::

# Python 3.7
def g():
    return [x for x in [(yield 1)]]
# With PEP 572
def g():
    sent_item = (yield 1)
    return [x for x in [sent_item]]

This more clearly shows that it is g(), not the comprehension, which is able to yield values (and is thus a generator function). The entire comprehension is consistently in a single scope.

Name reuse inside comprehensions

If the same name is used in the outermost iterable and also as an iteration variable, this will now raise UnboundLocalError when previously it referred to the name in the surrounding scope. Example::

# Lib/typing.py
tvars = []
for t in types:
    if isinstance(t, TypeVar) and t not in tvars:
        tvars.append(t)
    if isinstance(t, _GenericAlias) and not t._special:
        tvars.extend([ty for ty in t.__parameters__ if ty not in tvars])

If the list comprehension uses the name t rather than ty, this will work in Python 3.7 but not with this proposal. As with other unwanted name shadowing, the solution is to use distinct names.

Name lookups in class scope

A comprehension inside a class previously was able to 'see' class members ONLY from the outermost iterable. Other name lookups would ignore the class and potentially locate a name at an outer scope::

pattern = "<%d>"
class X:
    pattern = "[%d]"
    numbers = [pattern % n for n in range(5)]

In Python 3.7, X.numbers would show angle brackets; with PEP 572, it would show square brackets. Maintaining the current behaviour here is best done by using distinct names for the different forms of pattern, as would be the case with functions.

Generator expression bugs can be caught later

Certain types of bugs in genexps were previously caught more quickly. Some are now detected only at first iteration::

gen = (x for x in rage(10)) # NameError
gen = (x for x in 10) # TypeError (not iterable)
gen = (x for x in range(1/0)) # Exception raised during evaluation

This brings such generator expressions in line with a simple translation to function form::

def <genexp>():
    for x in rage(10):
        yield x
gen = <genexp>() # No exception yet
tng = next(gen) # NameError

Detecting these errors more quickly is nontrivial. It is, however, the exact same problem as generator functions currently suffer from, and this proposal brings the genexp in line with the most natural longhand form.

Open questions

Can the outermost iterable still be evaluated early?

As of Python 3.7, the outermost iterable in a genexp is evaluated early, and the result passed to the implicit function as an argument. With PEP 572, this would no longer be the case. Can we still, somehow, evaluate it before moving on? One possible implementation would be::

gen = (x for x in rage(10))
# translates to
def <genexp>():
    iterable = iter(rage(10))
    yield None
    for x in iterable:
        yield x
gen = <genexp>()
next(gen)

This would pump the iterable up to just before the loop starts, evaluating exactly as much as is evaluated outside the generator function in Py3.7. This would result in it being possible to call gen.send() immediately, unlike with most generators, and may incur unnecessary overhead in the common case where the iterable is pumped immediately (perhaps as part of a larger expression).

Importing names into comprehensions

A list comprehension can use and update local names, and they will retain their values from one iteration to another. It would be convenient to use this feature to create rolling or self-effecting data streams::

progressive_sums = [total := total + value for value in data]

This will fail with UnboundLocalError due to total not being initalized. Simply initializing it outside of the comprehension is insufficient - unless the comprehension is in class scope::

class X:
    total = 0
    progressive_sums = [total := total + value for value in data]

At other scopes, it may be beneficial to have a way to fetch a value from the surrounding scope. Should this be automatic? Should it be controlled with a keyword? Hypothetically (and using no new keywords), this could be written::

total = 0
progressive_sums = [total := total + value
    import nonlocal total
    for value in data]

Translated into longhand, this would become::

total = 0
def <listcomp>(total=total):
    result = []
    for value in data:
        result.append(total := total + value)
    return result
progressive_sums = <listcomp>()

ie utilizing the same early-binding technique that is used at class scope.

Frequently Raised Objections

Why not just turn existing assignment into an expression?

C and its derivatives define the = operator as an expression, rather than a statement as is Python's way. This allows assignments in more contexts, including contexts where comparisons are more common. The syntactic similarity between if (x == y) and if (x = y) belies their drastically different semantics. Thus this proposal uses := to clarify the distinction.

This could be used to create ugly code!

So can anything else. This is a tool, and it is up to the programmer to use it where it makes sense, and not use it where superior constructs can be used.

With assignment expressions, why bother with assignment statements?

The two forms have different flexibilities. The := operator can be used inside a larger expression; the = statement can be augmented to += and its friends. The assignment statement is a clear declaration of intent: this value is to be assigned to this target, and that's it.

Why not use a sublocal scope and prevent namespace pollution?

Previous revisions of this proposal involved sublocal scope (restricted to a single statement), preventing name leakage and namespace pollution. While a definite advantage in a number of situations, this increases complexity in many others, and the costs are not justified by the benefits. In the interests of language simplicity, the name bindings created here are exactly equivalent to any other name bindings, including that usage at class or module scope will create externally-visible names. This is no different from for loops or other constructs, and can be solved the same way: del the name once it is no longer needed, or prefix it with an underscore.

Names bound within a comprehension are local to that comprehension, even in the outermost iterable, and can thus be used freely without polluting the surrounding namespace.

Style guide recommendations

As this adds another way to spell some of the same effects as can already be done, it is worth noting a few broad recommendations. These could be included in PEP 8 and/or other style guides.

If either assignment statements or assignment expressions can be used, prefer statements; they are a clear declaration of intent.
If using assignment expressions would lead to ambiguity about execution order, restructure it to use statements instead.
Chaining multiple assignment expressions should generally be avoided. More than one assignment per expression can detract from readability.

Acknowledgements

The author wishes to thank Guido van Rossum and Nick Coghlan for their considerable contributions to this proposal, and to members of the core-mentorship mailing list for assistance with implementation.

References

.. [1] Proof of concept / reference implementation (https://github.com/Rosuav/cpython/tree/assignment-expressions)

Copyright

This document has been placed in the public domain.

.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

Previous message (by thread): [Python-Dev] Basic test to validate win10 install?
Next message (by thread): [Python-Dev] PEP 572: Assignment Expressions
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list