[Python-Dev] Optimize Python long integers (original) (raw)
Victor Stinner victor.stinner at haypocalc.com
Tue Nov 11 14:25:18 CET 2008
- Previous message: [Python-Dev] Python 3.0rc2: problem with exec()ing files
- Next message: [Python-Dev] Optimize Python long integers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
Patches
There are some very interesting propositions (with patches!) to optimize Python int and long types (especially the long integers).
haypo: Macros for PyLong: sign, number of digits, fits in an int http://bugs.python.org/issue4294
marketdickinson: Use 30-bit digits instead of 15-bit digits for integers http://bugs.python.org/issue4258
pernici: faster long multiplication http://bugs.python.org/issue3944
haypo: Victor Stinner's GMP patch for longs http://bugs.python.org/issue1814
fredrikj: Asymptotically faster divmod and str(long) http://bugs.python.org/issue3451
See also:
fredrikj: create a numbits() method for int and long types http://bugs.python.org/issue3439
Benchmark
I tried to do benchmark on all these patches using pystone or pybench, but the results are inaccurate. Pystone results change with +/- 20% with the same code on different runs. I tried more loops (pystone 250000), but it doesn't change anything. Pybench is better it is also inaccurate to benchmark operations on integers.
That's I started to write a basic benchmark tool to compare the different patches: see file bench_int.py of the issue #4294.
Benchmark on a 64 bits CPU @2.5 GHz :
original : 896.5 ms + macros : 891.0 ms (+0.6%) | ++ optimize : 884.3 ms (+1.4%) |-- issue #4294 +++ shift : 880.8 ms (+1.7%) | fast long : 746.8 ms (+16%) -- issue #3944 GMP : 700.9 ms (+22%) -- issue #1814 30 bits : 659.9 ms (+26%) -- issue #4258
Benchmark on a 32 bits CPU @3.0 GHz :
original : 1564.3 ms fast long : 1591.7 ms (-2%) GMP : 1564.4 ms (=) 30 bits : 1497.3 ms (+4%)
Don't trust the benchmark results since my tool is young and may also be inaccurate, but it's a good start to compare the patches.
Notes:
- my macro patch doesn't want to optimize anything, it's just a preparation for new patches; but I also attached "optimization" patches which are useless (only +1.7% with all patches).
- 30 bits is always faster
- GMP is faster on 64 bits or at same speed on 32 bits
- fast long is slower on a 32 bits CPU
Python integers
I tried to understand why the "30 bits" patch is not as fast as I expected. I introduced statistics in the long type: see long_stat.patch of the issue #4258. Summary (on a 32 bits CPU):
- PyLong_FromLong(), long_add() and long_compare() are the 3 most common operations on integers.
- Most integers are in range [-255; 255], large integers are rare. With make and pystone programs, largest integer has 1321 bits, but there is just one such integer. Another is 33 bits, but all other integers fits in 32 bits (without the sign, so 33 bits with the sign). I saw that the symbol table use memory address in a dictionary key (so 32 bits on a 32 bits CPU and 64 bits on a 32 bits CPU). => we have to focus on small integers
- pystone is inaccurate to benchmark integers: it only uses a big integer (more than 1 digit in base 2^30) on 1.000.000 integers => don't use pystone!
I don't have a solution working on any CPU with all numbers, this email is just a summary of the interesting integer patches.
-- Victor Stinner aka haypo http://www.haypocalc.com/blog/
- Previous message: [Python-Dev] Python 3.0rc2: problem with exec()ing files
- Next message: [Python-Dev] Optimize Python long integers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]