[issue45261] Unreliable (?) results from timeit (cache issue?)
Steven D'Aprano
report at bugs.python.org
Wed Sep 22 06:46:30 EDT 2021
Steven D'Aprano <steve+python at pearwood.info> added the comment:
Thanks Victor for the explanation about pyperf's addition features. They
do sound very useful. Perhaps we should consider adding some of them to
timeit?
However, in my opinion using the average is statistically wrong. Using
the mean is good when errors are two-sided, that is, your measured value
can be either too low or too high compared to the measured value:
measurement = true value ± random error
If the random errors are symmetrically distributed, then taking the
average tends to cancel them out and give you a better estimate of the
true value. Even if the errors aren't symmetrical, the mean will still
be a better estimate of the true value. (Or perhaps a trimmed mean, or
the median, if there are a lot of outliers.)
But timing results are not like that, the measurement errors are
one-sided, not two:
measurement = true value + random error
So by taking the average, all you are doing is averaging the errors, not
cancelling them. The result you get is *worse* as an estimate of the
true value than the minimum.
All those other factors (ignore the warmup, check for a small stdev,
etc) seem good to me. But the minimum, not the mean, is still going to
be closer to the true cost of running the code.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45261>
_______________________________________
More information about the Python-bugs-list
mailing list