[Python-Dev] big performance hit in the past few days (original) (raw)

Skip Montanaro skip@pobox.com
Wed, 3 Apr 2002 18:33:53 -0600


After Guido checked in the bool() stuff I cvs up'd and rebuilt. A few days ago I spent some time trying to quantify the effect of changes to SMALL_REQUEST_THRESHOLD on pymalloc performance. The "benchmark" consists of using the compiler package to compile Lib/*.py and three runs of the pystone main program with LOOPS set to 100000 (10x the usual value).

On March 31, I got the following output with and without pymalloc enabled and a SMALL_REQUEST_THRESHOLD of 256:

w/ pymalloc

Pystone(1.1) time for 100000 passes = 16.81
This machine benchmarks at 5948.84 pystones/second
Pystone(1.1) time for 100000 passes = 16.82
This machine benchmarks at 5945.3 pystones/second
Pystone(1.1) time for 100000 passes = 16.83
This machine benchmarks at 5941.77 pystones/second
243.84user 0.23system 4:12.73elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (402major+4432minor)pagefaults 0swaps

w/o pymalloc

Pystone(1.1) time for 100000 passes = 17.66
This machine benchmarks at 5662.51 pystones/second
Pystone(1.1) time for 100000 passes = 17.67
This machine benchmarks at 5659.31 pystones/second
Pystone(1.1) time for 100000 passes = 17.66
This machine benchmarks at 5662.51 pystones/second
277.88user 0.21system 4:48.10elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (400major+3943minor)pagefaults 0swaps

Running the same benchmark just now I got:

w/ pymalloc

Pystone(1.1) time for 100000 passes = 25.1
This machine benchmarks at 3984.06 pystones/second
Pystone(1.1) time for 100000 passes = 24.99
This machine benchmarks at 4001.6 pystones/second
Pystone(1.1) time for 100000 passes = 24.74
This machine benchmarks at 4042.04 pystones/second
352.33user 0.97system 6:51.40elapsed 85%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (499major+4181minor)pagefaults 0swaps

w/o pymalloc

Pystone(1.1) time for 100000 passes = 25.01
This machine benchmarks at 3998.4 pystones/second
Pystone(1.1) time for 100000 passes = 25.09
This machine benchmarks at 3985.65 pystones/second
Pystone(1.1) time for 100000 passes = 25.18
This machine benchmarks at 3971.41 pystones/second
374.38user 0.26system 6:37.71elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (401major+3950minor)pagefaults 0swaps

All files were compiled using gcc 3.0.4 with OPT set at -O3.

The fact that the tests slowed down dramatically both with and without pymalloc enabled suggests that recent changes to obmalloc are not to blame. (On March 31, I was using obmalloc.c 2.24. Today I'm using 2.27.)

Any thoughts on the possible cause? It's tough to casually suggest a particular culprit because the bool() stuff touched a lot of files. I can't simply identify a few files that changed in the past few days. I count 66 .c[ch] files new or updated since mid-afternoon April 1.

Skip