[Python-Dev] Inplace operations for PyLong objects (original) (raw)

Manciu, Catalin Gabriel catalin.gabriel.manciu at intel.com
Thu Aug 31 14:40:07 EDT 2017


Hi everyone,

While looking over the PyLong source code in Objects/longobject.c I came across the fact that the PyLong object doesnt't include implementation for basic inplace operations such as adding or multiplication:

[...] long_long, /nb_int/ 0, /nb_reserved/ long_float, /nb_float/ 0, /* nb_inplace_add / 0, / nb_inplace_subtract / 0, / nb_inplace_multiply / 0, / nb_inplace_remainder */ [...]

While I understand that the immutable nature of this type of object justifies this approach, I wanted to experiment and see how much performance an inplace add would bring. My inplace add will revert to calling the default long_add function when: - the refcount of the first operand indicates that it's being shared or - that operand is one of the preallocated 'small ints' which should mitigate the effects of not conforming to the PyLong immutability specification. It also allocates a new PyLong only in case of a potential overflow.

The workload I used to evaluate this is a simple script that does a lot of inplace adding:

import time
import sys

def write_progress(prev_percentage, value, limit):
    percentage = (100 * value) // limit
    if percentage != prev_percentage:
        sys.stdout.write("%d%%\r" % (percentage))
        sys.stdout.flush()
    return percentage

progress = -1
the_value = 0
the_increment = ((1 << 30) - 1)
crt_iter = 0
total_iters = 10 ** 9

start = time.time()

while crt_iter < total_iters:
    the_value += the_increment
    crt_iter += 1
    
    progress = write_progress(progress, crt_iter, total_iters)

end = time.time()

print ("\n%.3fs" % (end - start))
print ("the_value: %d" % (the_value))

Running the baseline version outputs: ./python inplace.py 100% 356.633s the_value: 1073741823000000000

Running the modified version outputs: ./python inplace.py 100% 308.606s the_value: 1073741823000000000

In summary, I got a +13.47% improvement for the modified version. The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost disabled (frequency is pinned at 4GHz).

Do you think that such an optimization would be a good approach ?

Thank you, Catalin



More information about the Python-Dev mailing list