bpo-32436: Don't use native popcount() (also fixes bpo-32641) by 1st1 · Pull Request #5292 · python/cpython (original) (raw)
SSE 4.2 is pretty recent is there's plenty of hardware out there that doesn't support it. To minimize the risk of CPython build not running on older CPUs it's easier to just stop using the popcnt
instruction.
I used the following micro-benchmark to make a decision to drop native popcount and always use the portable fallback code:
import time from _testcapi import hamt
h = hamt() for i in range(10000): h = h.set(str(i), i)
print(len(h), h.get('123'))
st = time.monotonic() for _ in range(10**6): h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123')
print(f'{time.monotonic() - st:.3f}s')
The results were the same on both with/without popcount builds.
To test the popcount instruction I've compiled CPython with `CFLAGS="-march=native". lldb session:
{pydev} ~/d/p/cpython (master %) » lldb -- ./python.exe t.py
(lldb) target create "./python.exe"
Current executable set to './python.exe' (x86_64).
(lldb) settings set -- target.run-args "t.py"
(lldb) breakpoint set --name hamt_bitcount
Breakpoint 1: 5 locations.
(lldb) run
Process 59304 launched: './python.exe' (x86_64)
python.exe was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 59304 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
frame #0: 0x00000001001171f9 python.exe`hamt_node_bitmap_assoc [inlined] hamt_bitcount(i=0) at hamt.c:446 [opt]
443 #if defined(__GNUC__) && (__GNUC__ > 4)
444 return (uint32_t)__builtin_popcountl(i);
445 #elif defined(__clang__) && (__clang_major__ > 3)
-> 446 return (uint32_t)__builtin_popcountl(i);
447 #elif defined(_MSC_VER)
448 return (uint32_t)__popcnt(i);
449 #else
Target 0: (python.exe) stopped.
(lldb) disassemble --pc
python.exe`hamt_node_bitmap_assoc:
-> 0x1001171f9 <+41>: popcntq %rdi, %r12
0x1001171fe <+46>: btl %r13d, %eax
0x100117202 <+50>: jae 0x1001173e6 ; <+534> at hamt.c
0x100117208 <+56>: leal (%r12,%r12), %eax