Issue 32589: Statistics as a result from timeit (original) (raw)

Issue32589

Created on 2018-01-18 04:54 by MGilch, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5229 closed MGilch,2018-01-18 05:05
Messages (15)
msg310218 - (view) Author: Matthias Gilch (MGilch) * Date: 2018-01-18 04:54
I propose to add a function parameter for timeit to get some statistics back from it. I think it's inconvenient to write those stubs every time when we have a method for this in the standard library that could be easily used with a few changes in it.
msg310220 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-01-18 06:02
What sort of statistics, and why do you think they are going to be meaningful? Measurement errors in the timings aren't two-sided, they are only one-sided, which means calculating the mean and standard deviation isn't appropriate. This is documented in the Timer.repeat() method: Note: it's tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python's speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. The purpose of the mean is to estimate the actual time taken by the computation, less any random errors ("noise") in the measurement values. If random noise is centred around zero, then the positive errors and the negative errors tend to cancel, and the mean is the best estimate of the actual computation time. But the noise isn't centred on zero, there are no negative noise, and the mean estimates the computation time plus the average measurement error, not the computation time. In other words, given a bunch of measurements: timer = Timer(code, setup) values = timer.repeat() the closest estimate to the true computation time is min(values), not mean(values). If you still insist on generating statistics, they're not difficult to collect using the statistics module: statistics.mean(values) statistics.stdev(values) If there are additional statistics you would like to see, they should be added to the statistics module, not timeit.
msg310221 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-01-18 06:10
> What sort of statistics...? I answered my own question... I see that in your PR, you return a dict with the mean, median, *population* variance and standard deviation, and total. Why the population variance/stdev rather than the sample? That seems inappropriate to me.
msg310222 - (view) Author: Matthias Gilch (MGilch) * Date: 2018-01-18 06:21
Population seemed appropriate to me. I'd define the list of times as a whole "population" that stands for its own. I should probably add both, the sample and population, values.
msg310242 - (view) Author: Matthias Gilch (MGilch) * Date: 2018-01-18 14:36
Added sample stdev and variance to the PR
msg310294 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-01-19 22:18
I am also skeptical that this will do more good than harm (by adding time and fostering mis-impressions). (This might better have started as a discussion on python-ideas list.)
msg310302 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-19 23:48
I suggest you to take a look at my perf project which has a timeit function and command. perf has a different design: it stores all values, and so many statisticals functions can be used. By the way, it does include most common statistical functions in its API, and provide a few more in the CLI. The timeit module of the stdlib computes 5 values by default... I'm not sure that it's revelant to compute the standard deviation only on 5 values. perf computes 60 values and computes then in 20 different processes (run sequentially). With 60 values, I expect that the computing statistics makes more sense. About the PR itself, I dislike providing a fixed list of statistical functions. For example, what if someone needs the geometric mean? What if you want to count outliers? etc. If someone really wants timeit to evolve, I suggest to return all values rather than only the minimum: as repeat() does. By the way, using the minimum is IMHO a bad idea, but I already proposed to change timeit, and my change was rejected because of the backward compatibility. I decided to stop trying to convince other core developers and leave the timeit module unchanged, with all its bugs. And instead, I suggest to everyone to stop using it (PyPy also warns users and asks to use perf instead), and use perf timeit instead...
msg310331 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-01-20 08:06
On Fri, Jan 19, 2018 at 11:48:46PM +0000, STINNER Victor wrote: > The timeit module of the stdlib computes 5 values by default... I'm > not sure that it's revelant to compute the standard deviation only on > 5 values. I made the same mistake as that: I too thought Matthias was talking about calculating the statistics based on timer.repeat(), but after looking more closely at the PR, I see that the statistics gathered are from the (default) 1000000 runs of each call to timer.timeit(). (Matthias: in future, please don't just dump a PR into the bug tracker with barely a word of explanation. We shouldn't have to read the source to understand what it does, let alone why it is done. Thank you.) Given that Mattias is calculating the stats of each call to timeit(), I'm not sure whether that makes more or less sense than what I thought. His patch effectively converts timeit() to calculate the time for one run a million times, instead of the total time for one million runs. That is one of the traps that Tim Peters warns about in the O'Reilly book: timing *one* run may give unrealistic results due to the timer's quantization. So I think this will be much less accurate. (Is this still true now that timeit() uses perf_counter() as the default timer, instead of clock() or time()? I don't know.) > About the PR itself, I dislike providing a fixed list of statistical > functions. For example, what if someone needs the geometric mean? What > if you want to count outliers? etc. I'm not sure that the geometric mean would be physically meaningful. But your point is well taken. For example, I think the median might be a better choice than mean -- but if some user disagrees, that's okay. > If someone really wants timeit to evolve, I suggest to return all > values rather than only the minimum: as repeat() does. timeit() doesn't return the minimum. timeit() calculates the total time for `number` (default is 1000000) runs, and returns that total; repeat() calls timeit() `repeat` (default is 3, not 5) times and returns them as a list. > By the way, using the minimum is IMHO a bad idea, but I already > proposed to change timeit, and my change was rejected because of the > backward compatibility. I am interested in why you think it is a bad idea to use the minimum as the best estimate of the actual execution time.
msg310333 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-20 09:07
For minimum vs average, see: https://bugs.python.org/issue28240
msg310334 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-20 09:09
"but after looking more closely at the PR, I see that the statistics gathered are from the (default) 1000000 runs of each call to timer.timeit()." Ah, don't do that. Timeit runs the same code multiple times and computes the average to compute one value on purpose. Reading a timer takes time, and measures smaller than 1 ms are unstable.
msg310340 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-20 12:32
I concur with Victor. This will defeat the purpose of timeit. And making the type of the result depending on arguments usually means a bad design. For you particular need it is easy to write an explicit loop and gather the statistics that your need.
msg310341 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-20 13:06
> For you particular need it is easy to write an explicit loop and gather the statistics that your need. I concur with Serhiy: >>> loops=2**15; values = [value/loops for value in t.repeat(10, loops)] >>> import statistics >>> statistics.mean(values) 3.439414190675727e-06 >>> statistics.stdev(values) 1.444941323254124e-07 Matthias Gilch: "I think it's inconvenient to write those stubs every time when we have a method for this in the standard library that could be easily used with a few changes in it." Hum, maybe a module on PyPI which uses timeit.Timer internally would be more appropriate, than extending the timeit module. (Or use my existing perf module, it's up to you ;-))
msg310356 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-01-20 17:57
If we agree that the patch is wrong, we should reject it and close this issue.
msg310408 - (view) Author: Matthias Gilch (MGilch) * Date: 2018-01-22 09:44
Steven: Thanks for the advice. Next time I'll follow it Victor: I'll look into perf I see that there are maybe problems and agree that we can close the issue and the PR.
msg310409 - (view) Author: Matthias Gilch (MGilch) * Date: 2018-01-22 09:45
Steven: Thanks for the advice. Next time I'll follow it Victor: I'll look into perf I see that there are maybe problems and agree that we can close the issue and the PR.
History
Date User Action Args
2022-04-11 14:58:56 admin set github: 76770
2018-01-22 09:45:16 MGilch set status: closedresolution: rejectedstage: patch review -> resolved
2018-01-22 09:45:00 MGilch set messages: +
2018-01-22 09:44:46 MGilch set status: open -> (no value)messages: +
2018-01-20 17:57:52 terry.reedy set messages: +
2018-01-20 13:06:19 vstinner set messages: +
2018-01-20 12:32:21 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2018-01-20 09:09:13 vstinner set messages: +
2018-01-20 09:07:39 vstinner set messages: +
2018-01-20 08:07:00 steven.daprano set messages: +
2018-01-19 23:48:46 vstinner set messages: +
2018-01-19 22🔞05 terry.reedy set nosy: + terry.reedy, vstinnermessages: +
2018-01-18 14:36:16 MGilch set messages: +
2018-01-18 06:21:15 MGilch set messages: +
2018-01-18 06:10:38 steven.daprano set messages: +
2018-01-18 06:02:39 steven.daprano set nosy: + steven.daprano, tim.petersmessages: +
2018-01-18 05:05:22 MGilch set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest5076>
2018-01-18 04:54:22 MGilch create