Math hypot exactfloat fastpath by rhettinger · Pull Request #8949 · python/cpython (original) (raw)
Provide a fast path for the common case of exact float inputs. Saves the overhead of an external function call and of the x == 1.0
error check. Allows the inner loops to mostly use registers.
$ ------ baseline -------
$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'
1000000 loops, best of 7: 297 nsec per loop
$ ------ patched -------
$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'
1000000 loops, best of 7: 215 nsec per loop
Disassembly of math_hypot() using GCC 8.2 shows a very tight inner loop without unnecessary register spills and reloads and without external calls that have to save and restore registers:
L378:
xorl %eax, %eax
andpd lC5(%rip), %xmm0 # x = fabs(x);
ucomisd %xmm0, %xmm0
movl $1, %ecx
setne %al
cmovp %ecx, %eax
orl %eax, %ebx # found_nan |= Py_IS_NAN(x);
L377:
movsd %xmm0, (%r12,%r15,8) # coordinates[i] = x;
maxsd %xmm1, %xmm0 # if (x > max) { max = x; }
addq $1, %r15 # i++
cmpq %r15, %rbp # i < n
movapd %xmm0, %xmm1
jle L418
L385:
movq 24(%r13,%r15,8), %rdi # item = PyTuple_GET_ITEM(args, i);
cmpq %r14, 8(%rdi) # if (PyFloat_CheckExact(item))
jne L375
movsd 16(%rdi), %xmm0 # x = PyFloat_AS_DOUBLE(item)
jmp L378