gh-91049: Introduce set vectorcall field API for PyFunctionObject by adphrost · Pull Request #92257 · python/cpython (original) (raw)

circling back to @markshannon 's performance impact question. -- tldr Geometric mean: 1.00x faster

some more details:

rebased this PR onto a4460f2 and ran pyperformance on a4460f2 and on the rebased branch

both baseline and branch were built using ./configure --enable-optimizations --with-lto && make -j

used pyperformance 1.0.5 (installed directly from GitHub) on bare metal AWS machine (Linux-5.13.0-1023-aws-x86_64-with-glibc2.31 with 72 logical CPUs), and excluded the sqlalchemy benchmarks because they tried building greenlet and failed

pyperformance run --benchmarks=-sqlalchemy_declarative,-sqlalchemy_imperative --python builds/3.12-opt/python -o main.json --rigorous
pyperformance run --benchmarks=-sqlalchemy_declarative,-sqlalchemy_imperative --python builds/3.12-gh-91049-vectorcall-opt/python -o gh-91049.json --rigorous

compare results:

pyperf compare_to main.json gh-91049.json -G
Slower (20):
- regex_effbot: 3.02 ms +- 0.01 ms -> 3.17 ms +- 0.01 ms: 1.05x slower
- meteor_contest: 110 ms +- 2 ms -> 114 ms +- 4 ms: 1.04x slower
- pidigits: 200 ms +- 0 ms -> 208 ms +- 0 ms: 1.04x slower
- telco: 7.17 ms +- 0.21 ms -> 7.39 ms +- 0.34 ms: 1.03x slower
- html5lib: 64.0 ms +- 2.7 ms -> 65.6 ms +- 2.6 ms: 1.02x slower
- unpickle_list: 5.30 us +- 0.08 us -> 5.40 us +- 0.05 us: 1.02x slower
- pyflate: 430 ms +- 3 ms -> 438 ms +- 4 ms: 1.02x slower
- sympy_sum: 168 ms +- 2 ms -> 171 ms +- 2 ms: 1.02x slower
- xml_etree_iterparse: 110 ms +- 1 ms -> 112 ms +- 3 ms: 1.02x slower
- async_tree_memoization: 640 ms +- 25 ms -> 650 ms +- 25 ms: 1.01x slower
- chaos: 71.8 ms +- 0.7 ms -> 72.8 ms +- 0.6 ms: 1.01x slower
- hexiom: 6.71 ms +- 0.06 ms -> 6.80 ms +- 0.06 ms: 1.01x slower
- sympy_str: 299 ms +- 3 ms -> 302 ms +- 4 ms: 1.01x slower
- spectral_norm: 101 ms +- 1 ms -> 103 ms +- 1 ms: 1.01x slower
- sympy_integrate: 21.5 ms +- 0.2 ms -> 21.8 ms +- 0.2 ms: 1.01x slower
- logging_silent: 105 ns +- 1 ns -> 107 ns +- 3 ns: 1.01x slower
- raytrace: 303 ms +- 2 ms -> 306 ms +- 4 ms: 1.01x slower
- regex_v8: 23.1 ms +- 0.3 ms -> 23.3 ms +- 0.1 ms: 1.01x slower
- unpickle_pure_python: 239 us +- 2 us -> 240 us +- 2 us: 1.01x slower
- 2to3: 282 ms +- 2 ms -> 283 ms +- 1 ms: 1.00x slower

Faster (25):
- genshi_xml: 57.1 ms +- 2.7 ms -> 54.3 ms +- 1.8 ms: 1.05x faster
- scimark_sparse_mat_mult: 4.93 ms +- 0.06 ms -> 4.70 ms +- 0.10 ms: 1.05x faster
- scimark_lu: 112 ms +- 2 ms -> 109 ms +- 2 ms: 1.03x faster
- json_dumps: 13.3 ms +- 0.2 ms -> 12.9 ms +- 0.1 ms: 1.03x faster
- scimark_fft: 337 ms +- 3 ms -> 331 ms +- 3 ms: 1.02x faster
- nbody: 97.6 ms +- 1.8 ms -> 95.9 ms +- 1.8 ms: 1.02x faster
- pathlib: 23.0 ms +- 1.2 ms -> 22.6 ms +- 0.8 ms: 1.02x faster
- pickle_pure_python: 324 us +- 3 us -> 318 us +- 3 us: 1.02x faster
- genshi_text: 23.2 ms +- 0.4 ms -> 22.8 ms +- 0.3 ms: 1.02x faster
- unpack_sequence: 46.5 ns +- 0.6 ns -> 45.7 ns +- 0.7 ns: 1.02x faster
- fannkuch: 401 ms +- 6 ms -> 396 ms +- 4 ms: 1.01x faster
- chameleon: 7.06 ms +- 0.06 ms -> 6.98 ms +- 0.08 ms: 1.01x faster
- pickle_dict: 27.5 us +- 0.1 us -> 27.2 us +- 1.0 us: 1.01x faster
- pickle: 10.2 us +- 0.1 us -> 10.1 us +- 0.1 us: 1.01x faster
- xml_etree_parse: 170 ms +- 2 ms -> 168 ms +- 3 ms: 1.01x faster
- xml_etree_process: 56.8 ms +- 0.5 ms -> 56.3 ms +- 0.5 ms: 1.01x faster
- nqueens: 88.0 ms +- 1.1 ms -> 87.2 ms +- 1.1 ms: 1.01x faster
- json_loads: 26.3 us +- 0.3 us -> 26.1 us +- 0.2 us: 1.01x faster
- async_tree_none: 549 ms +- 15 ms -> 544 ms +- 12 ms: 1.01x faster
- async_tree_cpu_io_mixed: 771 ms +- 13 ms -> 766 ms +- 10 ms: 1.01x faster
- mako: 10.5 ms +- 0.0 ms -> 10.4 ms +- 0.1 ms: 1.01x faster
- crypto_pyaes: 77.2 ms +- 0.8 ms -> 76.8 ms +- 0.9 ms: 1.01x faster
- xml_etree_generate: 81.2 ms +- 0.6 ms -> 80.8 ms +- 0.5 ms: 1.00x faster
- regex_compile: 142 ms +- 1 ms -> 141 ms +- 1 ms: 1.00x faster
- scimark_sor: 121 ms +- 1 ms -> 120 ms +- 1 ms: 1.00x faster

Benchmark hidden because not significant (18): async_tree_io, deltablue, django_template, dulwich_log, float, go, logging_format, logging_simple, pickle_list, python_startup, python_startup_no_site, regex_dna, richards, scimark_monte_carlo, sqlite_synth, sympy_expand, tornado_http, unpickle

Geometric mean: 1.00x faster

thanks @ansley for doing the perf analysis on this :)