O(N²) compile-time regression for large constant tuples after const folding moved to CFG (original) (raw)
While migrating a project from 3.11 to 3.14, I hit a case where a data file
containing ~81,000 nested 2-tuples (think ((x, y), (x, y), ...)) caused the
compiler to hang.
Turns out this is a regression from the const folding migration
(gh-126835). The root cause
is add_const() in Python/flowgraph.c — it does a linear scan over theconsts list:
for (index = 0; index < PyList_GET_SIZE(consts); index++) {
if (PyList_GET_ITEM(consts, index) == newconst) {
break;
}
}
Before gh-126835, large constant tuples were folded in one shot at the AST stage,
so add_const was rarely called with big consts lists. After
gh-130769 moved tuple folding
to CFG, fold_tuple_of_constants() calls add_const() once per inner tuple
— N calls × O(N) scan = O(N²).
perf profiling shows add_const taking 83.76% of CPU time at N=100K.
This isn’t limited to tuple folding — any CFG-stage constant folding hittingadd_const is affected. Unary/binary op folding was also moved to CFG in
gh-129550, so patterns like
these show the same O(N²) behavior:
(-1, -2, ..., -N)— eachUNARY_NEGATIVEfolded viaadd_const, 31.7x
slower than 3.11 at N=100K(0+1, 0+2, ..., 0+N)— eachBINARY_OPfolded viaadd_const, 9.7x
slower at N=100K
The issue is present on 3.15.0a7 as well.
Reproducer:
import sys, time
N = int(sys.argv[1]) if len(sys.argv) > 1 else 100000
# Generate source: vertices = ((-0.001, -0.002), (-0.003, -0.004), ...)
elements = ", ".join(f"({i * 0.001:.6f}, {i * 0.002:.6f})" for i in range(N))
source = f"vertices = ({elements},)"
t0 = time.perf_counter()
code_obj = compile(source, "<test>", "exec")
t1 = time.perf_counter()
print(f"Compile time: {t1-t0:.4f}s", flush=True)
Proposed fix:
Add an auxiliary _Py_hashtable_t (pointer → index) alongside the consts list
in _PyCfg_OptimizeCodeUnit, replacing the linear scan with O(1) lookup. It uses_Py_hashtable_hash_ptr / _Py_hashtable_compare_direct — pure pointer ops, no
Python object overhead. The hashtable is created before optimize_cfg() and
destroyed right after, before remove_unused_consts() reindexes things.
This works because _PyCompile_ConstCacheMergeOne() already guarantees identity
uniqueness (equal-valued constants share the same pointer), so pointer-based lookup
is semantically equivalent to the existing identity scan. All affected functions arestatic — no public API changes.
Results (N=100K):
| Scenario | 3.11 | 3.14 (before) | 3.14 (after fix) |
|---|---|---|---|
| Nested tuples ((f, f), ...) | 1.24s | 10.38s | 1.48s |
| Unary neg (-1, -2, ..., -N) | 0.22s | 6.86s | 1.12s |
| Binary add (0+1, 0+2, ..., 0+N) | 0.27s | 2.59s | 1.40s |
All existing tests pass (test_compile, test_peepholer, test_ast, test_dis).
There’s also a further optimization opportunity — folding all-constant tuples
directly at codegen stage to avoid the instruction bloat entirely (closing the
remaining gap vs 3.11) — but that’s a separate discussion.
Draft fix: Fix O(N²) in add_const() after constant folding moved to CFG · zSirius/cpython@120e518 · GitHub
Thoughts?