Inline dict/list/set comprehensions in the compiler for better performance · Issue #97933 · python/cpython (original) (raw)

Feature or enhancement

In Cinder we inline some list/dict/set comprehensions in the compiler for better performance. That is, instead of creating a function object every time and calling it, for some comprehensions we just emit bytecode directly in the outer function to implement the comprehension loop.

Pitch

This change was a significant CPU efficiency improvement on the Instagram production web tier. Allocating a new Python function object is a significant and not strictly necessary cost.

There are some backward-compatibility considerations. In the Cinder implementation, we refuse to inline comprehensions if there is a name collision between a name assigned within the comprehension and a name in the containing scope, and we delete any names defined in the comprehension immediately after the inlined bytecode, to preserve the same visibility of names defined in the comprehension. There can still be some visible changes, e.g. if locals() is called within the comprehension it will show names from the containing function too. If a traceback occurs within the comprehension (or sys._getframe is called), it will not show an additional last frame for the comprehension itself. In practice we've not observed either of these to be a problem.

I discussed this idea with @markshannon, and he suggested a couple possible improvements. Instead of refusing to inline comprehensions with name clashes, we could push the prior value of the colliding name(s) onto the stack before running the comprehension, and then pop it back into the name afterward. This would allow inlining ~all comprehensions. He also suggested that we could add PUSH_FRAME and POP_FRAME opcodes and wrap the inlined bytecode in these. Then we would still elide the cost of function object creation, but the comprehension bytecode (although part of the parent bytecode) would still run inside its own frame, eliminating the above-mentioned incompatibilities.

Linked PRs