gh-97933: inline list/dict/set comprehensions by carljm · Pull Request #101441 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation73 Commits56 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

carljm

Closes #97933.

In builds with --enable-optimizations:

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]' --compare-to ../mainopt/python
/home/carljm/build/mainopt/python: ..................... 200 ns +- 3 ns
/home/carljm/build/ic2opt/python: ..................... 120 ns +- 2 ns

Mean +- std dev: [/home/carljm/build/mainopt/python] 200 ns +- 3 ns -> [/home/carljm/build/ic2opt/python] 120 ns +- 2 ns: 1.67x faster

Credit to @vladima for authoring the original version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.

A couple implementation notes:

This PR also adds a lot of new tests for comprehension scoping edge cases, and runs all new and existing scoping
tests in module, function, and class scopes (without requiring duplication of the test code.) All comprehension
scoping tests added in this diff also pass with comprehension inlining disabled (ensuring semantics did not change.)

ponimas, NoahTheDuke, 5j9, erlend-aasland, and bdraco reacted with heart emoji itamaro, TeamSpen210, vladima, sigma67, CAM-Gerlach, adamchainz, 5j9, corona10, erlend-aasland, scastlara, and 2 more reacted with rocket emoji

@carljm

@carljm

@carljm

@carljm

@carljm

@carljm carljm changed the titlegh-97933: inline sync list/dict/set comprehensions gh-97933: inline list/dict/set comprehensions in functions

Jan 31, 2023

@carljm

@markshannon

I'll run the benchmarks now

@markshannon

Benchmarking is delayed a bit, but should be back tomorrow at the latest.

@markshannon

You don't seem to be clearing the inner variable before use, so that

def f(): l = [ None ] [ (l[0], l) in [[1,2]] ] print(l)

will, I think, print [1] instead of raising an UnboundLocalError

You can actually make the code more efficient by clearing the local when stashing it on the stack:

inst(LOAD_FAST_AND_CLEAR, ( -- val)) { val = LOCALS[oparg]; LOCALS[oparg] = NULL; }

as there is no need for a NULL check or INCREF.

markshannon

@markshannon

A couple of obscure issues, but this looks like a very nice improvement.

I think the gain in performance is well worth the cost in code size, given how common comprehensions are.

@carljm

@carljm

You don't seem to be clearing the inner variable before use

Your example code doesn't involve a comprehension, I'm assuming you meant something like

def f():
    l = [ None ]
    [ (l[0], l) for l in [[1,2]] ]
    print(l)

But this doesn't raise UnboundLocalError on main either, it works fine; the iteration variable l is always bound within the comprehension before the expression is evaluated.

So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!

You can actually make the code more efficient by clearing the local when stashing it on the stack

But I think this is sufficient reason regardless to justify switching from LOAD_FAST_OR_NULL to LOAD_FAST_AND_CLEAR, so I'll do that!

@carljm

Actually I don't think LOAD_FAST_AND_CLEAR will work, because of cases like this:

def f(x):
    return [x for x in x]

We need to save/restore the outer value of x, but we cannot clear it on entry because we are also relying on being able to load it when we evaluate the iterable part of the comprehension (which in the non-inlined case would have been evaluated outside the function and passed in as the .0 parameter.) So I think we need to stick with a non-clearing LOAD_FAST_OR_NULL on entry.

If there were a correctness reason for needing to clear in other cases, I could add an additional DELETE_FAST after the LOAD_FAST_OR_NULL, but so far I haven't found any such correctness need, and clearing this way is not an efficiency improvement.

EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though...

@carljm

@carljm

@markshannon I think LOAD_FAST_OR_NULL could also become a pseudo-instruction that resolves to LOAD_FAST (which would get around having it turned into LOAD_FAST_CHECK), but we'd have to remove the assert(value != NULL) in the LOAD_FAST implementation. Do you prefer keeping that assertion at the cost of another (non-pseudo) opcode?

@carljm

@carljm

I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:

@markshannon

Sorry for posting a gibberish example.
Here's what I should have posted:

def f(): l = [ None ] [ 1 for (l[0], l) in [[1,2]] ] print(l) f()

3.11

$ python3.11 ~/test/test.py 
Traceback (most recent call last):
   ...
UnboundLocalError: cannot access local variable 'l' where it is not associated with a value

This PR

$ ./python ~/test/test.py 
[1]

iritkatriel

carljm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review!

@carljm @iritkatriel

Co-authored-by: Irit Katriel 1055913+iritkatriel@users.noreply.github.com

@carljm

@carljm

@bedevere-bot

🤖 New build scheduled with the buildbot fleet by @carljm for commit a8425a6 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

iritkatriel

erlend-aasland

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see Jelle and Irit did the heavy review, so this is your PEP-7 bot chiming in for some style suggestions 🤖

@carljm @erlend-aasland

Co-authored-by: Erlend E. Aasland erlend.aasland@protonmail.com

@carljm @erlend-aasland

Co-authored-by: Erlend E. Aasland erlend.aasland@protonmail.com

@carljm

@bedevere-bot

🤖 New build scheduled with the buildbot fleet by @carljm for commit 95401fe 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@carljm

The s390x Fedora LTO + PGO PR and s390x Fedora LTO PR buildbots look generally unhealthy. The latter has a full disk.

AMD64 Debian root PR is weird. The failure looks a bit flaky (segfault in test_threads_join) and unlikely to be related, but that failure isn't showing up on any of its other builds. Requested a re-build to see if it repros.

@carljm

The buildbot worker hosting the various s390x Fedora ... builds appears to be out of disk space and unhealthy. So I'm satisfied that there aren't any buildbot issues caused by this PR.

Merging!

carljm added a commit to carljm/cpython that referenced this pull request

May 9, 2023

@carljm

carljm added a commit to carljm/cpython that referenced this pull request

May 9, 2023

@carljm

carljm added a commit to carljm/cpython that referenced this pull request

May 9, 2023

@carljm

This was referenced

Apr 10, 2024