bpo-32494: Use gdbm_count for dbm_length if possible by corona10 · Pull Request #19814 · python/cpython (original) (raw)

By this PR, we can use gdbm_count without exporting a new public API.

I ran the benchmark and it shows noticeable performance enhancement.
This can be measured by invalidating cached value.

Benchmark 1

Run len(kv) after putting new value to invalidate the cache.

| Benchmark | bpo-32494-master | bpo-32494-proposed |
+===========+==================+==============================+
| bpo-32494 | 262 us | 42.2 us: 6.20x faster (-84%) |
+-----------+------------------+------------------------------+

import pyperf

runner = pyperf.Runner() runner.timeit(name="bpo-32494", stmt=""" ret = len(kv) kv[f'key-{ret}'] = f'value-{ret}' """ , setup = """ import dbm.gnu as gdbm from test.support import TESTFN kv = gdbm.open(TESTFN, 'c') for i in range(1000): kv[f'key-{i}'] = f'value-{i}' """ )

Benchmark2

Remove caching code path to measure without putting new key/value.

+-----------+--------------------+-------------------------------+
| Benchmark | bpo-32494-master-1 | bpo-32494-proposed-1 |
+===========+====================+===============================+
| bpo-32494 | 109 us | 590 ns: 185.32x faster (-99%) |
+-----------+--------------------+-------------------------------+

import pyperf

runner = pyperf.Runner() runner.timeit(name="bpo-32494", stmt=""" ret = len(kv) """ , setup = """ import dbm.gnu as gdbm from test.support import TESTFN kv = gdbm.open(TESTFN, 'c') for i in range(1000): kv[f'key-{i}'] = f'value-{i}' """ )

https://bugs.python.org/issue32494