[Python-Dev] Reworking the GIL (original) (raw)
Collin Winter collinw at gmail.com
Mon Oct 26 21:01:34 CET 2009
- Previous message: [Python-Dev] Reworking the GIL
- Next message: [Python-Dev] Reworking the GIL
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Oct 25, 2009 at 1:22 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
Having other people test it would be fine. Even better if you have an actual multi-threaded py3k application. But ccbench results for other OSes would be nice too :-)
My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):
Control (py3k @ r75723)
--- Throughput ---
Pi calculation (Python)
threads=1: 633 iterations/s. threads=2: 468 ( 74 %) threads=3: 443 ( 70 %) threads=4: 442 ( 69 %)
regular expression (C)
threads=1: 281 iterations/s. threads=2: 282 ( 100 %) threads=3: 282 ( 100 %) threads=4: 282 ( 100 %)
bz2 compression (C)
threads=1: 379 iterations/s. threads=2: 735 ( 193 %) threads=3: 733 ( 193 %) threads=4: 724 ( 190 %)
--- Latency ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 1 ms. (std dev: 1 ms.) CPU threads=2: 1 ms. (std dev: 2 ms.) CPU threads=3: 3 ms. (std dev: 6 ms.) CPU threads=4: 2 ms. (std dev: 3 ms.)
Background CPU task: regular expression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 975 ms. (std dev: 577 ms.) CPU threads=2: 1035 ms. (std dev: 571 ms.) CPU threads=3: 1098 ms. (std dev: 556 ms.) CPU threads=4: 1195 ms. (std dev: 557 ms.)
Background CPU task: bz2 compression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 2 ms.) CPU threads=2: 4 ms. (std dev: 5 ms.) CPU threads=3: 0 ms. (std dev: 0 ms.) CPU threads=4: 1 ms. (std dev: 4 ms.)
Experiment (newgil branch @ r75723)
--- Throughput ---
Pi calculation (Python)
threads=1: 651 iterations/s. threads=2: 643 ( 98 %) threads=3: 637 ( 97 %) threads=4: 625 ( 95 %)
regular expression (C)
threads=1: 298 iterations/s. threads=2: 296 ( 99 %) threads=3: 288 ( 96 %) threads=4: 287 ( 96 %)
bz2 compression (C)
threads=1: 378 iterations/s. threads=2: 720 ( 190 %) threads=3: 724 ( 191 %) threads=4: 718 ( 189 %)
--- Latency ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 1 ms.) CPU threads=2: 0 ms. (std dev: 1 ms.) CPU threads=3: 0 ms. (std dev: 0 ms.) CPU threads=4: 1 ms. (std dev: 5 ms.)
Background CPU task: regular expression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 1 ms. (std dev: 0 ms.) CPU threads=2: 2 ms. (std dev: 1 ms.) CPU threads=3: 2 ms. (std dev: 2 ms.) CPU threads=4: 2 ms. (std dev: 1 ms.)
Background CPU task: bz2 compression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 0 ms.) CPU threads=2: 2 ms. (std dev: 3 ms.) CPU threads=3: 0 ms. (std dev: 1 ms.) CPU threads=4: 0 ms. (std dev: 0 ms.)
I also ran this through Unladen Swallow's threading microbenchmark, which is a straight copy of what David Beazley was experimenting with (simply iterating over 1000000 ints in pure Python) [1]. "iterative_count" is doing the loops one after the other, "threaded_count" is doing the loops in parallel using threads.
The results below are benchmarking py3k as the control, newgil as the experiment. When it says "x% faster", that is a measure of newgil's performance over py3k's.
With two threads:
iterative_count: Min: 0.336573 -> 0.387782: 13.21% slower # I've run this configuration multiple times and gotten the same slowdown. Avg: 0.338473 -> 0.418559: 19.13% slower Significant (t=-38.434785, a=0.95)
threaded_count: Min: 0.529859 -> 0.397134: 33.42% faster Avg: 0.581786 -> 0.429933: 35.32% faster Significant (t=70.100445, a=0.95)
With four threads:
iterative_count: Min: 0.766617 -> 0.734354: 4.39% faster Avg: 0.771954 -> 0.751374: 2.74% faster Significant (t=22.164103, a=0.95) Stddev: 0.00262 -> 0.00891: 70.53% larger
threaded_count: Min: 1.175750 -> 0.829181: 41.80% faster Avg: 1.224157 -> 0.867506: 41.11% faster Significant (t=161.715477, a=0.95) Stddev: 0.01900 -> 0.01120: 69.65% smaller
With eight threads:
iterative_count: Min: 1.527794 -> 1.447421: 5.55% faster Avg: 1.536911 -> 1.479940: 3.85% faster Significant (t=35.559595, a=0.95) Stddev: 0.00394 -> 0.01553: 74.61% larger
threaded_count: Min: 2.424553 -> 1.677180: 44.56% faster Avg: 2.484922 -> 1.723093: 44.21% faster Significant (t=184.766131, a=0.95) Stddev: 0.02874 -> 0.02956: 2.78% larger
I'd be interested in multithreaded benchmarks with less-homogenous workloads.
Collin Winter
[1] - http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_threading.py
- Previous message: [Python-Dev] Reworking the GIL
- Next message: [Python-Dev] Reworking the GIL
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]