Math hypot exactfloat fastpath by rhettinger · Pull Request #8949 · python/cpython (original) (raw)

Provide a fast path for the common case of exact float inputs. Saves the overhead of an external function call and of the x == 1.0 error check. Allows the inner loops to mostly use registers.

$ ------ baseline -------
$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'
1000000 loops, best of 7: 297 nsec per loop

$ ------ patched -------
$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'
1000000 loops, best of 7: 215 nsec per loop

Disassembly of math_hypot() using GCC 8.2 shows a very tight inner loop without unnecessary register spills and reloads and without external calls that have to save and restore registers:

L378:
xorl	%eax, %eax
andpd	lC5(%rip), %xmm0         # x = fabs(x);
ucomisd	%xmm0, %xmm0
movl	$1, %ecx
setne	%al
cmovp	%ecx, %eax
orl	%eax, %ebx               # found_nan |= Py_IS_NAN(x);
L377:
movsd	%xmm0, (%r12,%r15,8)     # coordinates[i] = x;
maxsd	%xmm1, %xmm0             # if (x > max) { max = x; }
addq	$1, %r15                 # i++
cmpq	%r15, %rbp               # i < n
movapd	%xmm0, %xmm1
jle	L418
L385:
movq	24(%r13,%r15,8), %rdi    # item = PyTuple_GET_ITEM(args, i);
cmpq	%r14, 8(%rdi)            # if (PyFloat_CheckExact(item))
jne	L375
movsd	16(%rdi), %xmm0          # x = PyFloat_AS_DOUBLE(item)
jmp	L378