[issue20499] Rounding errors with statistics.variance

Steven D'Aprano report at bugs.python.org
Thu Sep 9 08:02:18 EDT 2021


Steven D'Aprano <steve+python at pearwood.info> added the comment:

Regarding the call to sorted that you removed.

I just realised that this was buggy! In my testing, I found that there was a consistent speed-up by summing fractions in order of increasing denominators (small denominators first):


>>> from fractions import Fraction
>>> from timeit import Timer
>>> from random import shuffle
>>>
>>> d1 = [Fraction(1, d) for d in range(1, 1000)]
>>> d2 = d1[:]
>>> shuffle(d2)
>>>
>>> t1 = Timer('sum(d)', setup='from __main__ import d1 as d')
>>> t2 = Timer('sum(d)', setup='from __main__ import d2 as d')
>>> 
>>> min(t1.repeat(number=100, repeat=7)) # small dens first
1.048042511974927
>>> min(t1.repeat(number=100, repeat=7))
1.0449080819962546
>>>
>>> min(t2.repeat(number=100, repeat=7)) # random order
1.2761094039888121
>>> min(t2.repeat(number=100, repeat=7))
1.2763739289948717

Summing larger denominators before smaller denominators is even slower:

>>> d3 = list(reversed(d1))
>>> t3 = Timer('sum(d)', setup='from __main__ import d3 as d')
>>> 
>>> min(t3.repeat(number=100, repeat=7)) # large dens first
1.6581684999982826
>>> min(t3.repeat(number=100, repeat=7))
1.6580656860023737


But then I screwed up by sorting the *tuples* (num, den) which would have the effect of summing small *numerators* first, not denominators.

Thanks for catching that they way I had written it, the sort would have had at best no real performance benefit.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue20499>
_______________________________________


More information about the Python-bugs-list mailing list