[Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part) (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sun Jul 1 02:11:41 EDT 2018

Previous message (by thread): [Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)
Next message (by thread): [Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 1 July 2018 at 14:32, Tim Peters <tim.peters at gmail.com> wrote:

[Nick]

The PEP specifically cites this example as motivation: The PEP gives many examples. Your original was a strawman mischaracterization of the PEP's motivations (note the plural: you only mentioned "minor performance improvement", and snipped my listing of the major motivations).

I listed two motivations, not one:

Minor performance improvements (the "avoiding repeated subexpressions without use multiple statements" rational)
Making certain coding patterns easier to spot (the loop-and-a-half and if-elif chaining cases)

Technically, avoid repeated subexpressions without requiring a separate line also falls into the second category.

The subsequent interaction with comprehensions and generator expressions is an interesting side effect of extending the basic idea to a fully coherent and self-consistent proposal, not one of the original motivations for it.

group = re.match(data).group(1) if re.match(data) else None

That code's already perfectly straightforward to read and write as a single line, I disagree. In any case of textual repetition, it's a visual pattern-matching puzzle to identify the common substrings (I have to visually scan that line about 3 times to be sure), and then a potentially difficult conceptual puzzle to figure out whether side effects may result in textually identical substrings evaluating to different objects. That's why "refererential transparency" is so highly valued in functional languages ("if subexpressions are spelled the same, they evaluate to the same result, period" - which isn't generally true in Python - to get that enormously helpful (to reasoning) guarantee in Python you have to ensure the subexpression is evaluated exactly once). And as you of all people should be complaining about, textual repetition is also prone to "oops - forgot one!" and "oops! made a typo when changing the second one!" when code is later modified.

That's a reasonable readability based argument, but it's not what the PEP currently gives as a motivation for this aspect of the proposal.

so the only reason to quibble about it I gave you three better reasons to quibble about it just above ;-)

Then add them to the PEP, as what's currently there really isn't offering a compelling motivation for this aspect of the proposal :)

is because it's slower than the arguably less clear two-line alternative:

m = re.match(data) group = m.group(1) if m else None I find that much clearer than the one-liner above: the visual pattern matching is easier because the repeated substring is shorter and of much simpler syntactic structure; it guarantees by construction that the two instances of m evaluate to the same object, so there's no possible concern about that (it doesn't even matter if you bound re to some "non-standard" object that has nothing to do with Python's re module); and any later changes to the single instance of re.match(data) don't have to be repeated verbatim elsewhere. It's possible that it runs twice as fast too, but that's the least of my concerns.

I agree with this, but also think the two-line form is a perfectly acceptable way of spelling it, and a perfectly acceptable refactoring of the one-line form with duplicated subexpressions to improve maintainability.

All of those advantages are retained in the one-liner too if an assignment expression can be used in it.

Sure, but the open design question is whether folks that would have written the one-liner with repeated subexpressions are going to be any more likely to use an assignment expression to avoid the repetition without prompting by a more experienced developer than they are to use a separate preceding assignment statement.

That assessment of "What is the increased chance that the repeated subexpression will be avoided when the code is first written?" then gets traded off against the overall increase in language complexity arising from allowing name bindings in arbitrary subexpressions.

Don't get me wrong, I now agree that the proposal in PEP 572 is the most coherent and self-consistent approach to assignment expressions that we could pursue given the existing scoping semantics of comprehensions and generator expressions.

The remaining point of contention is only the "Inevitable cost of change" one: given the level of disruption this will cause in the way that Python gets taught to new users, is it giving a commensurate pay-off in increased semantic expressiveness?

My answer to that question remains "No", while your answer is either "Yes" or "I don't see why that should matter" (I'm genuinely unsure which).

sometimes it allows a more compact way of reusing an expensive subexpression by giving it a name. Which they already do by giving it a name in a separate statement, so the possible improvement would be in brevity rather than performance. You already realized the performance gain could be achieved by using two statements. The additional performance gain by using assignment expressions is at best trivial (it may save a LOADFAST opcode to fetch the object bound to m for the if test). So, no, gaining performance is not the motivation here. You already had a way to make it "run fast'. The motivation is the brevity assignment expressions allow while retaining all of the two-statement form's advantages in easier readability, easier reasoning, reduced redundancy, and performance.

I never said the motivation was to gain performance relative to the two-statement version - I said the motivation given in the PEP is to gain performance relative to the repeated subexpression version, without making the transition to the already supported two-statement version.

As Guido said, in the PEP, of the example you gave here:

Guido found several examples where a programmer repeated a subexpression, slowing down the program, in order to save one line of code

It couldn't possibly be clearer that Guido thought the programmer's motivation was brevity ("in order to save one line of code"). Guido only happened to mention that they were willing to slow down the code to get that brevity, but, as above, they were also willing to make the code harder to read, reason about, and maintain. With the assignment expression, they don't have to give up any of the latter to get the brevity they mistakenly think ;-) they care most about - and, indeed, they can make it even briefer.

The quoted paragraph from the PEP clearly states that the reason the repeated subexpression is considered a problem is because it slows down the program, not because it repeats code.

As noted above, the PEP could certainly be updated to point out that repeating subexpressions is problematic for more reasons than just speed, but that isn't what it currently says.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message (by thread): [Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)
Next message (by thread): [Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list