bpo-32436: Don't use native popcount() (also fixes bpo-32641) by 1st1 · Pull Request #5292 · python/cpython (original) (raw)

SSE 4.2 is pretty recent is there's plenty of hardware out there that doesn't support it. To minimize the risk of CPython build not running on older CPUs it's easier to just stop using the popcnt instruction.

I used the following micro-benchmark to make a decision to drop native popcount and always use the portable fallback code:

import time from _testcapi import hamt

h = hamt() for i in range(10000): h = h.set(str(i), i)

print(len(h), h.get('123'))

st = time.monotonic() for _ in range(10**6): h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123') h.get('123')

print(f'{time.monotonic() - st:.3f}s')

The results were the same on both with/without popcount builds.

To test the popcount instruction I've compiled CPython with `CFLAGS="-march=native". lldb session:

{pydev} ~/d/p/cpython (master %) » lldb -- ./python.exe t.py
(lldb) target create "./python.exe"
Current executable set to './python.exe' (x86_64).
(lldb) settings set -- target.run-args  "t.py"
(lldb) breakpoint set --name hamt_bitcount
Breakpoint 1: 5 locations.
(lldb) run
Process 59304 launched: './python.exe' (x86_64)
python.exe was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 59304 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
    frame #0: 0x00000001001171f9 python.exe`hamt_node_bitmap_assoc [inlined] hamt_bitcount(i=0) at hamt.c:446 [opt]
   443 	#if defined(__GNUC__) && (__GNUC__ > 4)
   444 	    return (uint32_t)__builtin_popcountl(i);
   445 	#elif defined(__clang__) && (__clang_major__ > 3)
-> 446 	    return (uint32_t)__builtin_popcountl(i);
   447 	#elif defined(_MSC_VER)
   448 	    return (uint32_t)__popcnt(i);
   449 	#else
Target 0: (python.exe) stopped.
(lldb) disassemble --pc
python.exe`hamt_node_bitmap_assoc:
->  0x1001171f9 <+41>: popcntq %rdi, %r12
    0x1001171fe <+46>: btl    %r13d, %eax
    0x100117202 <+50>: jae    0x1001173e6               ; <+534> at hamt.c
    0x100117208 <+56>: leal   (%r12,%r12), %eax

https://bugs.python.org/issue32436