[Python-Dev] PEP 572: Write vs Read, Understand and Control Flow (original) (raw)
Tim Peters tim.peters at gmail.com
Tue Apr 24 21:10:49 EDT 2018
- Previous message (by thread): [Python-Dev] PEP 572: Write vs Read, Understand and Control Flow
- Next message (by thread): [Python-Dev] PEP 572: Write vs Read, Understand and Control Flow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Victor Stinner] ...
Tim Peter gaves the following example. "LONG" version:
diff = x - xbase if diff: g = gcd(diff, n) if g > 1: return g versus the "SHORT" version: if (diff := x - xbase) and (g := gcd(diff, n)) > 1: return g == Write == If your job is to write code: the SHORT version can be preferred since it's closer to what you have in mind and the code is shorter. When you read your own code, it seems straightforward and you like to see everything on the same line.
All so, but a bit more: in context, this is just one block in a complex algorithm. The amount of vertical screen space it consumes directly affects how much of what comes before and after it can be seen without scrolling. Understanding this one block in isolation is approximately useless unless you can also see how it fits into the whole. Saving 3 lines of 5 is substantial, but it's more often saving 1 of 5 or 6. Regardless, they add up.
The LONG version looks like your expressiveness is limited by the computer. It's like having to use simple words when you talk to a child, because a child is unable to understand more subtle and advanced sentences. You want to write beautiful code for adults, right?
I want the whole to be as transparent as possible. That's a complicated balancing act in practice.
== Read and Understand ==
In my professional experience, I spent most of my time on reading code, rather than writing code. By reading, I mean: try to understand why this specific bug that cannot occur... is always reproduced by the customer, whereas we fail to reproduce it in our test lab :-) This bug is impossible, you know it, right? So let's say that you never read the example before, and it has a bug.
Then you're screwed - pay me to fix it ;-) Seriously, as above, this
block on its own is senseless without understanding both the
mathematics behind what it's doing, and on how all the code before it
picked x
and x_base
to begin with.
By "reading the code", I really mean understanding here. In your opinion, which version is easier to understand, without actually running the code?
Honestly, I find the shorter version a bit easier to understand: fewer indentation levels, and less semantically empty repetition of names.
IMHO the LONG version is simpler to understand, since the code is straightforward, it's easy to "guess" the control flow (guess in which order instructions will be executed).
You're saying you don't know that in "x and y" Python evaluates x first, and only evaluates y if x "is truthy"? Sorry, but this seems trivial to me in either spelling.
Print the code on paper and try to draw lines to follow the control flow. It may be easier to understand how SHORT is more complex to understand than LONG.
Since they're semantically identical, there's something suspect about a conclusion that one is necessarily harder to understand than the other ;-) I don't have a problem with you finding the longer version easier to understand, but I do have a problem if you have a problem with me finding the shorter easier.
== Debug ==
Now let's imagine that you can run the code (someone succeeded to reproduce the bug in the test lab!). Since it has a bug, you now likely want to try to understand why the bug occurs using a debugger. Sadly, most debugger are designed as if a single line of code can only execute a single instruction. I tried pdb: you cannot only run (diff := x - xbase) and then get "diff" value, before running the second assingment, you can only execute the full line at once. I would say that the LONG version is easier to debug, at least using pdb.
That might be a good reason to avoid, say, list comprehensions (highly
complex expressions of just about any kind), but I think this
overlooks the primary point of "binding expressions": to give names
to intermediate results. I couldn't care less if pdb executes the
whole "if" statement in one gulp, because I get exactly the same info
either way: the names diff
and g
bound to the results of the
expressions they named. What actual difference does it make whether
pdb binds the names one at a time, or both, before it returns to the
prompt?
Binding expressions are debugger-friendly in that they don't just vanish without a trace. It's their purpose to capture the values of the expressions they name. Indeed, you may want to add them all over the place inside expressions, never intending to use the names, just so that you can see otherwise-ephemeral intra-expression results in your debugger ;-)
... Think about tracebacks. If you get an xception at "line 1" in the SHORT example (the long "if" expression), what can you deduce from the line number? What happened?
If you get an exception in the LONG example, the line number gives you a little bit more information... maybe just enough to understand the bug?
This one I wholly agree with, in general. In the specific example at hand, it's weak, because there's so little that could raise an exception. For example, if the variables weren't bound to integers, in context the code would have blown up long before reaching this block. Python ints are unbounded, so overflow in "-" or "gcd" aren't possible either. MemoryError is theoretically possible, and in that case it would be good to know whether it happened during "-" or during "gcd()". Good to know, but not really helpful, because either way you ran out of memory :-(
== Write code for babies! ==
Please don't write code for yourself, but write code for babies! :-) These babies are going to maintain your code for the next 5 years, while you moved to a different team or project in the meanwhile. Be kind with your coworkers and juniors! I'm trying to write a single instruction per line whenever possible, even if the used language allows me much more complex expressions. Even if the C language allows assignments in if, I avoid them, because I regularly have to debug my own code in gdb ;-) Now the question is which Python are allowed for babies. I recall that a colleague was surprised and confused by context managers. Does it mean that try/finally should be preferred? What about f'Hello {name.title()}' which calls a method into a "string" (formatting)? Or metaclasses? I guess that the limit should depend on your team, and may be explained in the coding style designed by your whole team?
It's the kind of thing I prefer to leave to team style guides, because consensus will never be reached. In a different recent thread, someone complained about using functions at all, because their names are never wholly accurate, and in any case they hide what's "really" going on. To my eyes, that was an unreasonably extreme "write code for babies" position.
If a style guide banned using "and" or "or" in Python "if" or "while" tests, I'd find that less extreme, but also unreasonable.
But if a style guide banned functions with more than 50 formal arguments, I'd find that unreasonably tolerant.
Luckily, I only have to write code for me now, so am free to pick the perfect compromise in every case ;-)
- Previous message (by thread): [Python-Dev] PEP 572: Write vs Read, Understand and Control Flow
- Next message (by thread): [Python-Dev] PEP 572: Write vs Read, Understand and Control Flow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]