gh-97933: inline list/dict/set comprehensions by carljm · Pull Request #101441 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation73 Commits56 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Closes #97933.
In builds with --enable-optimizations
:
➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]' --compare-to ../mainopt/python
/home/carljm/build/mainopt/python: ..................... 200 ns +- 3 ns
/home/carljm/build/ic2opt/python: ..................... 120 ns +- 2 ns
Mean +- std dev: [/home/carljm/build/mainopt/python] 200 ns +- 3 ns -> [/home/carljm/build/ic2opt/python] 120 ns +- 2 ns: 1.67x faster
Credit to @vladima for authoring the original version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.
A couple implementation notes:
- We need a new opcode
LOAD_FAST_AND_CLEAR
because we need to push/pop the value of possibly-undefined variables on the stack while clearing them. To do this with existing opcodes would require eliminating the invariant/assert thatLOAD_FAST
never loadsNULL
, and would add inefficient refcount operations and interpreter loop cycles. - If a comprehension iteration variable is a cell inside the comprehension (i.e. the comprehension builds closures) we end up including that variable in both
co_varnames
andco_cellvars
for the inlined-into outer function and using the non-offset (varnames) storage location for it. This is how function arguments that are also cells are currently handled, so we just piggyback on that handling. This makes the needed push/pop of the outer cell simpler as we can just useLOAD_FAST_AND_CLEAR/STORE_FAST
as-is for it, and it will push/pop the cell itself. Otherwise we would need some new handling in the compiler for allowing push/pop of a cell in offset storage.
This PR also adds a lot of new tests for comprehension scoping edge cases, and runs all new and existing scoping
tests in module, function, and class scopes (without requiring duplication of the test code.) All comprehension
scoping tests added in this diff also pass with comprehension inlining disabled (ensuring semantics did not change.)
ponimas, NoahTheDuke, 5j9, erlend-aasland, and bdraco reacted with heart emoji itamaro, TeamSpen210, vladima, sigma67, CAM-Gerlach, adamchainz, 5j9, corona10, erlend-aasland, scastlara, and 2 more reacted with rocket emoji
carljm changed the title
gh-97933: inline sync list/dict/set comprehensions gh-97933: inline list/dict/set comprehensions in functions
I'll run the benchmarks now
Benchmarking is delayed a bit, but should be back tomorrow at the latest.
You don't seem to be clearing the inner variable before use, so that
def f(): l = [ None ] [ (l[0], l) in [[1,2]] ] print(l)
will, I think, print [1]
instead of raising an UnboundLocalError
You can actually make the code more efficient by clearing the local when stashing it on the stack:
inst(LOAD_FAST_AND_CLEAR, ( -- val)) { val = LOCALS[oparg]; LOCALS[oparg] = NULL; }
as there is no need for a NULL
check or INCREF
.
A couple of obscure issues, but this looks like a very nice improvement.
I think the gain in performance is well worth the cost in code size, given how common comprehensions are.
You don't seem to be clearing the inner variable before use
Your example code doesn't involve a comprehension, I'm assuming you meant something like
def f():
l = [ None ]
[ (l[0], l) for l in [[1,2]] ]
print(l)
But this doesn't raise UnboundLocalError
on main either, it works fine; the iteration variable l
is always bound within the comprehension before the expression is evaluated.
So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!
You can actually make the code more efficient by clearing the local when stashing it on the stack
But I think this is sufficient reason regardless to justify switching from LOAD_FAST_OR_NULL
to LOAD_FAST_AND_CLEAR
, so I'll do that!
Actually I don't think LOAD_FAST_AND_CLEAR
will work, because of cases like this:
def f(x):
return [x for x in x]
We need to save/restore the outer value of x
, but we cannot clear it on entry because we are also relying on being able to load it when we evaluate the iterable part of the comprehension (which in the non-inlined case would have been evaluated outside the function and passed in as the .0
parameter.) So I think we need to stick with a non-clearing LOAD_FAST_OR_NULL
on entry.
If there were a correctness reason for needing to clear in other cases, I could add an additional DELETE_FAST
after the LOAD_FAST_OR_NULL
, but so far I haven't found any such correctness need, and clearing this way is not an efficiency improvement.
EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though...
@markshannon I think LOAD_FAST_OR_NULL
could also become a pseudo-instruction that resolves to LOAD_FAST
(which would get around having it turned into LOAD_FAST_CHECK
), but we'd have to remove the assert(value != NULL)
in the LOAD_FAST
implementation. Do you prefer keeping that assertion at the cost of another (non-pseudo) opcode?
I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:
- Eliminate some unnecessary push/pops in cases where the value is never loaded after the comprehension. This would fall into the general category of dead store elimination. I'm not sure if we can actually do this or if it violates some guarantees about introspectable frame state in tracing?
- Better handling of SWAPs on the post-comprehension pops.
apply_static_swaps
can't help us here if the comprehension value is immediately returned (because we can't "swap" aRETURN_VALUE
-- this case would go away if we can do the dead store elimination, since if it's right before aRETURN_VALUE
it's clearly a dead store) or if it's immediately popped (because thePOP_TOP
will have a lineno of-1
andapply_static_swaps
will thus refuse to swap it). Instead of having oneSWAP 2
per pop, we could have a singleSWAP N
and do the last pop first. - Teach
add_checks_for_loads_of_uninitialized_variables
to understand and trace through the push/pop pairs.
Sorry for posting a gibberish example.
Here's what I should have posted:
def f(): l = [ None ] [ 1 for (l[0], l) in [[1,2]] ] print(l) f()
3.11
$ python3.11 ~/test/test.py
Traceback (most recent call last):
...
UnboundLocalError: cannot access local variable 'l' where it is not associated with a value
This PR
$ ./python ~/test/test.py
[1]
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review!
Co-authored-by: Irit Katriel 1055913+iritkatriel@users.noreply.github.com
🤖 New build scheduled with the buildbot fleet by @carljm for commit a8425a6 🤖
If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see Jelle and Irit did the heavy review, so this is your PEP-7 bot chiming in for some style suggestions 🤖
Co-authored-by: Erlend E. Aasland erlend.aasland@protonmail.com
Co-authored-by: Erlend E. Aasland erlend.aasland@protonmail.com
🤖 New build scheduled with the buildbot fleet by @carljm for commit 95401fe 🤖
If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.
The s390x Fedora LTO + PGO PR and s390x Fedora LTO PR buildbots look generally unhealthy. The latter has a full disk.
AMD64 Debian root PR is weird. The failure looks a bit flaky (segfault in test_threads_join
) and unlikely to be related, but that failure isn't showing up on any of its other builds. Requested a re-build to see if it repros.
The buildbot worker hosting the various s390x Fedora ...
builds appears to be out of disk space and unhealthy. So I'm satisfied that there aren't any buildbot issues caused by this PR.
Merging!
carljm added a commit to carljm/cpython that referenced this pull request
carljm added a commit to carljm/cpython that referenced this pull request
carljm added a commit to carljm/cpython that referenced this pull request
This was referenced
Apr 10, 2024