Issue 5512: Streamline integer division (original) (raw)

Created on 2009-03-18 22:11 by mark.dickinson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
faster_integer_division.patch mark.dickinson,2009-03-18 22:11
pidigits_bestof.py mark.dickinson,2009-03-18 22:54 Integer benchmark
pidigits-2.py vstinner,2009-03-18 23:22
faster_integer_division2.patch mark.dickinson,2009-03-21 22:19
Messages (16)
msg83781 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-18 22:11
Here's a patch that streamlines the x_divrem function in Objects/longobject.c. More benchmarks are needed, but the initial results are promising: for py3k, on a 32-bit machine with 15-bit digits, Victor's pidigits benchmark runs almost twice as fast with this patch (numbers below). Patch details ============= - Normalize inputs by shifting instead of multiplying and dividing. This halves the number of divisions used in the algorithm. - Streamline innermost loop. - Save an iteration of outer loop around half the time. - Save an object allocation: only 3 allocations per x_divrem call instead of 4. - Remove special case where initial quotient estimate is >= PyLong_BASE. There's no need for this, since the 'digit' type holds values up to 2*PyLong_BASE - 1. - Make q type digit instead of type twodigits: this halves the size of the multiplication in the innermost loop. Benchmark results ================= Using the pidigits_bestof.py script that's posted in the issue 4258 discussion, on a non-debug build of py3k (r70465), on OS X 10.5.6/Core 2 Duo: Unpatched --------- Macintosh-3:py3k dickinsm$ ./python.exe ../pidigits_bestof.py 2000 performing a warm up run... running sys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2) Time; 2234.6 ms Time; 2232.2 ms Time; 2227.9 ms Time; 2225.7 ms Time; 2229.8 ms Best Time; 2225.7 ms Patched ------- dickinsm$ ./python.exe ../pidigits_bestof.py 2000 performing a warm up run... running sys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2) Time; 1175.6 ms Time; 1176.5 ms Time; 1177.3 ms Time; 1179.5 ms Time; 1168.5 ms Best Time; 1168.5 ms So the patch gives a speedup of around 90%. This particular benchmark is heavy on the divisions though, so I'd expect lesser speedups for other benchmarks.
msg83782 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 22:17
Would it be possible to include integer benchmark tools to upstream? I mean the "pidigit" tool (I didn't wrote this test! I just ported it to Python3) and maybe my "bench_int" tool. It could useful to reproduce the benchmark on different Python versions and different computers. We may open a new issue for each tool?
msg83785 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-18 22:31
Sounds good to me! An integer benchmark script in the Tools/ directory would be handy.
msg83786 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-03-18 22:43
Well, the original code in pidigits is by me :-) (although it's not stated in the source file). I haven't tried the patch but the performance work you're doing is really impressive, Mark!
msg83788 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-03-18 22:47
Results here (Athlon X2 3600+ with gcc in 64-bit mode with 30-bit digits), "pidigits_bestof.py 2000": - unpatched: 1644.9 ms - patched: 694.8 ms !
msg83789 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 22:48
Useful informations for an integer benchmark: - CPU type: 32 or 64 bits, endian - Memory size - Python version - Informations about how Python was compiled - Integer implementation: use int_info when it's available, otherwise base=2^15 Who has the last version of pidigit tool? Can you attach it here? :-)
msg83790 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-18 22:52
[Antoine] > Well, the original code in pidigits is by me :-) Apologies for the misattribution!
msg83791 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-18 22:54
Here's the version of pidigits that I was using.
msg83793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 23:22
New version of pidigit: - works on python 2.6, trunk and py3k - display Python version, CPU info (bits, endian), PyLong info (base, digit size) - display usage if no argument is given
msg83794 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 23:22
Oh, there is no version number of the benchmark tool itself!
msg83795 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 23:27
My numbers: # No patch $ ./python /home/haypo/pidigits-2.py 3000 Python 3.1a1+ (py3k:70466, Mar 18 2009, 23:56:06) [GCC 4.3.2] CPU: 64 bits, little endian PyLong: base=2^30, sizeof(digit)=32 bits (...) Best Time; 2300.2 ms # With faster_integer_division.patch (...) Best Time; 1138.1 ms Ok, it's two times faster on 64 bits CPU!!! Other notes about pidigits: - missing header (no author name, no description) - pidigit should also write the number of digits in its output to avoid mistakes in comparaisons
msg83953 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-21 22:19
Updated patch. Lots of cleanup, but only one significant change: the inner loop now uses signed arithmetic instead of unsigned arithmetic. This saves a negation and fixes a subtle bug: the previous inner loop code was incorrect when using 15-bit digits on machines with sizeof(short) == sizeof(long). Not that I know of any such machines: the Cray T3E famously has no 16-bit integer type, but there sizeof(short) is 4 and sizeof(long) is 8. A few more timings, this time from doing a single huge integer division: 10**1000000//10**500000 (so this effectively times just the inner loop, since all else will be insignificant). All timings are best-of-5, from non-debug builds of py3k, on the same system: OS X 10.5.6/2.4 GHz Core 2 Duo. Times in brackets are the approximate per-inner-loop times (remembering that there are 4 times as many iterations of the inner loop for 15-bit digits). 32-bit build, 15-bit digits, unpatched: 92382.2 ms (~7.5 ns) 32-bit build, 15-bit digits, patched: 36473.3 ms (~3.0 ns) 64-bit build, 30-bit digits, unpatched: 14581.4 ms (~4.8 ns) 64-bit build, 30-bit digits, patched: 7385.1 ms (~2.4 ns) ... and just for fun, the other combinations: 64-bit build, 15-bit digits, unpatched: 61927.5 ms (~5.1 ns) 64-bit build, 15-bit digits, patched: 43632.9 ms (~3.6 ns) 32-bit build, 30-bit digits, unpatched: 62374.1 ms (~20.3 ns) 32-bit build, 30-bit digits, patched: 26928.3 ms (~8.8 ns) Thanks for the updated pidigits script, Victor! Maybe this is too small right now to be worth including in the Tools directory, but I hope we can fatten it up with some other benchmarks. What do you think?
msg84027 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-23 20:04
Applied, r70542 and r70547.
msg84066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-24 09:00
Would it be possible to port the patch to python trunk?
msg84067 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-03-24 09:10
Hi Victor! I already applied the x_divrem patch to the trunk and to py3k. Did you mean the release branch? I'd prefer not to backport this to 2.6 or 3.0: it's not a new feature, but it's not a bugfix either and there's always some risk of breakage...
msg84068 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-24 09:11
> I already applied the x_divrem patch to the trunk and to py3k Oops! I taught that you applied it to py3k and release30-maint :-/ It's ok, trunk and py3k is enough ;-)
History
Date User Action Args
2022-04-11 14:56:46 admin set github: 49762
2009-03-24 09:11:59 vstinner set messages: +
2009-03-24 09:10:01 mark.dickinson set messages: +
2009-03-24 09:00:07 vstinner set messages: +
2009-03-23 20:04:49 mark.dickinson set status: open -> closedresolution: acceptedmessages: +
2009-03-21 22:19:31 mark.dickinson set files: + faster_integer_division2.patchmessages: +
2009-03-18 23:27:21 vstinner set messages: +
2009-03-18 23:22:35 vstinner set messages: +
2009-03-18 23:22:07 vstinner set files: + pidigits-2.pymessages: +
2009-03-18 22:54:54 mark.dickinson set files: + pidigits_bestof.pymessages: +
2009-03-18 22:52:05 mark.dickinson set messages: +
2009-03-18 22:48:53 vstinner set messages: +
2009-03-18 22:47:45 pitrou set messages: +
2009-03-18 22:43:56 pitrou set nosy: + pitroumessages: +
2009-03-18 22:31:29 mark.dickinson set messages: +
2009-03-18 22:17:09 vstinner set nosy: + vstinnermessages: +
2009-03-18 22:11:26 mark.dickinson create