[Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity! (original) (raw)
Chris Angelico rosuav at gmail.com
Wed Apr 25 04:23:00 EDT 2018
- Previous message (by thread): [Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity!
- Next message (by thread): [Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Apr 25, 2018 at 4:55 PM, Nathaniel Smith <njs at pobox.com> wrote:
On Tue, Apr 24, 2018 at 8:31 AM, Chris Angelico <rosuav at gmail.com> wrote:
The most notable change since last posting is that the assignment target is no longer as flexible as with the statement form of assignment, but is restricted to a simple name.
Note that the reference implementation has not been updated. I haven't read most of the discussion around this, so my apologies if I say anything that's redundant. But since it seems like this might be starting to converge, I just read through it for the first time, and have a few comments. First, though, let me say that this is a really nice document, and I appreciate the incredible amount of work it takes to write something like this and manage the discussions! Regardless of the final outcome it's definitely a valuable contribution.
Thank you, but I'm hoping to do more than just rejected PEPs. (I do have one co-authorship to my name, but that was something Guido started, so it kinda had a bit of an advantage there.) My reputation as champion of death march PEPs is not something I want to continue. :|
Concretely, I find it unnerving that two of these work, and one doesn't:
# assignment on the right of usage results = [(x, y, x/y) for x in inputdata if (y := f(x)) > 0] # assignment on the left of usage stuff = [[y := f(x), x/y] for x in range(5)] # assignment on the right of usage stuff = [[x/y, y := f(x)] for x in range(5)]
Fair point. However, this isn't because of assignment operators, but because of comprehensions. There are two important constructs in Python that have out-of-order evaluation:
x if cond else y # evaluates cond before x [result for x in iterable if cond] # evaluates result last target()[0] = value # evaluates target after value
... Amongst our weaponry ... err, I'll come in again.
Ahem. Most of Python is evaluated left-to-right, top-to-bottom, just as anyone familiar with a Latin-derived language would expect. There are exceptions, however, and those are generally on the basis that "practicality beats purity". Those exceptions include assignment, the if/else expression, and comprehensions/genexps. But even within those constructs, evaluation is left-to-right as much as it possibly can be. A list comprehension places the target expression first (exception to the LTR rule), but evaluates all its 'for' and 'if' clauses in order.
We already have some strange cases as a result of this out-of-order evaluation. For instance:
reciprocals = [1/x for x in values if x]
The 'if x' on the right means we won't get a ZeroDivisionError from the expression on the left. Were this loop to be unrolled, it would look like this:
def listcomp(): result = [] for x in values: if x: result.append(1/x) return result reciprocals = listcomp()
And you can easily audit the longhand form to confirm that, yes, "if x" comes before "1/x". It's the same with assignment expressions; the only exception to the "left before right" rule is the primary expression being evaluated after all for/if clauses. Unrolling your three examples gives (eliding the function wrappers for simplicity):
assignment on the right of usage
for x in input_data: if (y := f(x)) > 0: results.append((x, y, x/y))
assignment on the left of usage
for x in range(5): stuff.append([y := f(x), x/y])
assignment on the right of usage
for x in range(5): stuff.append([x/y, y := f(x)])
Were list comprehensions wrong to put the expression first? I'm not sure, but I can't see a better way to write them; at best, you'd end up with something like:
[for x in numbers if x % 2: x * x]
which introduces its own sources of confusion (though I have to say, it would look pretty clean in the "one for loop, no conditions" case). But whether they're right or wrong, they're what we have, and side effects are already legal, and assignment expressions are just another form of side effect.
I guess this isn't limited to comprehensions either – I rarely see complex expressions with side-effects embedded in the middle, so I'm actually a bit hazy about the exact order of operations inside Python expressions. I could probably figure it out if necessary, but even in simple cases like f(g(), h()), then do you really know for certain off the top of your head whether g() or h() runs first? Does the average user? With code like f(a := g(), h(a)) this suddenly matters a lot! But comprehensions suffer from a particularly extreme version of this, so it worries me that they're being put forward as the first set of motivating examples.
Honestly, I would fully expect that g() is run first, but I know there are more complicated cases. For instance, here are three ways to print "1" followed by "2", and create a dictionary mapping None to None:
x={} x[print(2)] = print(1) 1 2 x={print(1): print(2)} 1 2 x={print(2): print(1) for in [1]} 1 2
Hmmmmmmmmm. One of these is not like the others...
Capturing condition values -------------------------- Note to Chris: your examples in this section have gotten their order scrambled; you'll want to fix that :-). And I'm going to reorder them yet again in my reply...
# Reading socket data until an empty string is returned while data := sock.read(): print("Received data:", data) I don't find this example very convincing. If it were written: for data in iter(sock.read, b""): ... then that would make it clearer what's happening ("oh right, sock.read uses b"" as a sentinel to indicate EOF).
That's assuming that you want an equality comparison, which is the case for the thing you're assuming of it, but that's all. Recommending that people use iter() and a 'for' loop has a couple of consequences:
It uses a syntax that's distinctly unobvious. You're achieving the end result, but what exactly does it mean to iterate over sock.read? It doesn't read very cleanly.
You encourage the use of == as the one and only comparison. If you have an API that returns None when it's done, and you use "iter(func, None)", you're actually checking if the yielded value == None, not if it is None.
Suppose you want to change the check so that there are two termination conditions. Do you stick with iter() and then add the other check inside the loop? Do you rewrite the code back into the four-line loop header with the infinite loop, so you can use "in {x, y}" as your condition? iter() can solve a few problems, but not many of them, and not always correctly.
And the fact that this is needed at all is only because sockets are a low-level API with lots of complexity inherited from BSD sockets. If this were a normal python API, it'd just be
for data in sock: ...
<chomp details of why it can't be that sort of API>
The complexity they inherit is fundamental to the nature of sockets, so that part isn't going to change. I've simplified it down to a coherent example, in the hopes that it'd make sense that way. Maybe I need a different example; people keep getting hung up on the details of sockets instead of the proposed syntax.
# Proposed syntax while (command := input("> ")) != "quit": print("You entered:", command)
# Equivalent in current Python, not caring about function return value while input("> ") != "quit": print("You entered a command.") # To capture the return value in current Python demands a four-line # loop header. while True: command = input("> "); if command == "quit": break print("You entered:", command) Particularly with the
while
loop, this can remove the need to have an infinite loop, an assignment, and a condition. It also creates a smooth parallel between a loop which simply uses a function call as its condition, and one which uses that as its condition but also uses the actual value. I dare you to describe that first version in English :-). I would say: "it reads the next line of input and stores it in 'command'; then checks if it was 'quit', and if so it exits the loop; otherwise, it prints the command".
While the command (from the input function) is not equal to "quit", print out "You entered:" and the command.
(Plus in a real version of this you'd have some command line parsing to do – at least stripping off whitespace from the command, probably tokenizing it somehow – before you could check what the command was, and then you're back to the final version anyway.)
Not if this is a wire protocol. But again, people keep getting hung up on the specifics of the examples, saying "oh but if you change what you're doing, it won't fit the := operator any more". Well, yeah, of course that's true.
# Capturing regular expression match objects # See, for instance, Lib/pydoc.py, which uses a multiline spelling # of this effect if match := re.search(pat, text): print("Found:", match.group(0)) Now this is a genuinely compelling example! re match objects are always awkward to work with. But this feels like a very big hammer to make re.match easier to use :-). I wonder if there's anything more focused we could do here?
Rejected proposals: Dedicated syntax for regular expression matching?
Special-casing conditional statements -------------------------------------
One of the most popular use-cases is
if
andwhile
statements. Instead of a more general solution, this proposal enhances the syntax of these two statements to add a means of capturing the compared value:: if re.search(pat, text) as match: print("Found:", match.group(0)) This works beautifully if and ONLY if the desired condition is based on the truthiness of the captured value. It is thus effective for specific use-cases (regex matches, socket reads that return''
when done), and completely useless in more complicated cases (eg where the condition isf(x) < 0
and you want to capture the value off(x)
). It also has no benefit to list comprehensions. Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction of possible use-cases, even inif
/while
statements. It does only cover a fraction of possible use-cases, but interestingly, the fraction it covers includes: - two of the three real examples given in the rationale section - exactly the cases that don't force you to twist your brain in pretzels thinking about sequential side-effecting control flow in the middle of expressions.
It only is able to cover the regex case because the regex API has been deliberately crafted to make this possible. If you need any sort of comparison other than "is the captured object truthy?", this syntax is insufficient.
However, I do think it'd be kinda confusing if we had:
if EXPR as X: while EXPR as X: with EXPR as X: and the first two assign the value of EXPR to X, while the last one does something more subtle. Or maybe it'd be fine?
Nope, it definitely would not be fine. This is covered by your opening acknowledgement that you haven't read all the previous posts. :)
ChrisA
- Previous message (by thread): [Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity!
- Next message (by thread): [Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]