[Python-Dev] Microbenchmarks (original) (raw)

Victor Stinner victor.stinner at gmail.com
Thu Sep 15 15:33:28 EDT 2016

Previous message (by thread): [Python-Dev] Python parser performance optimizations
Next message (by thread): [Python-Dev] Microbenchmarks
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The discussion on benchmarking is no more related to compact dict, so I start a new thread.

2016-09-15 13:27 GMT+02:00 Paul Moore <p.f.moore at gmail.com>:

Just as a side point, perf provided essentially identical results but took 2 minutes as opposed to 8 seconds for timeit to do so. I understand why perf is better, and I appreciate all the work Victor did to create it, and analyze the results, but for getting a quick impression of how a microbenchmark performs, I don't see timeit as being quite as bad as Victor is claiming.

He he, I expected such complain. I already wrote a section in the doc explaining "why perf is so slow": http://perf.readthedocs.io/en/latest/perf.html#why-is-perf-so-slow

So you say that timeit just works and is faster? Ok. Let's see a small session:

$ python3 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 46.7 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 46.9 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 46.9To msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 47 msec per loop

$ python2 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 36.3 msec per loop $ python2 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 36.1 msec per loop $ python2 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 36.5 msec per loop

$ python3 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 48.3 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(106)))" "list(d)" 10 loops, best of 3: 48.4 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 48.8 msec per loop

I ran timeit 7 times on Python 3 and 3 times on Python 2. Please ignore Python 2, it's just a quick command to interfere with Python 3 tests.

Now the question is: what is the "correct" result for Python3? Let's take the minimum of the minimums: 46.7 ms.

Now imagine that you only ran only have the first 4 runs. What is the "good" result now? Min is still 46.7 ms.

And what if you only had the last 3 runs? What is the "good" result now? Min becomes 48.3 ms.

On such microbenchmark, the difference between 46.7 ms and 48.3 ms is large :-(

How do you know that you ran timeit enough times to make sure that the result is the good one?

For me, the timeit tool is broken because you must run it many times to workaround its limits.

In short, I wrote the perf module to answer to these questions.

perf uses multiple processes to test multiple memory layouts and multiple randomized hash functions
perf ignores the first run, used to "warmup" the benchmark (--warmups command line option)
perf provides many tools to analyze the distribution of results: minimum, maximum, standard deviation, histogram, number of samples, median, etc.
perf displays the median +- standard deviation: median is more reproductible and standard deviation gives an idea of the stability of the benchmark
etc.

I will tend to use perf now that I have it installed, and now that I know how to run a published timeit invocation using perf. It's a really cool tool. But I certainly won't object to seeing people publish timeit results (any more than I'd object to any mirobenchmark).

I consider that timeit results are not reliable at all. There is no standard deviation and it's hard to guess how much times the user ran timeit nor how he/she computed the "good result".

perf takes ~60 seconds by default. If you don't care of the accuracy, use --fast and it now only takes 20 seconds ;-)

Victor

Previous message (by thread): [Python-Dev] Python parser performance optimizations
Next message (by thread): [Python-Dev] Microbenchmarks
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list