[New-bugs-announce] [issue35904] Add statistics.fmean(seq)

Tue Feb 5 18:59:21 EST 2019

New submission from Raymond Hettinger <raymond.hettinger at gmail.com>:

The current mean() function makes heroic efforts to achieve last bit accuracy and when possible to retain the data type of the input.

What is needed is an alternative that has a simpler signature, that is much faster, that is highly accurate without demanding perfection, and that is usually what people expect mean() is going to do, the same as their calculators or numpy.mean():

   def fmean(seq: Sequence[float]) -> float:
       return math.fsum(seq) / len(seq)

On my current 3.8 build, this code given an approx 500x speed-up (almost three orders of magnitude).   Note that having a fast fmean() function is important in resampling statistics where the mean() is typically called many times:  http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm 

$ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from statistics import mean' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)'
50 loops, best of 11: 6.8 msec per loop

$ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from math import fsum' -s 'mean=lambda seq: fsum(seq)/len(seq)' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)'
2000 loops, best of 11: 155 usec per loop

----------
assignee: steven.daprano
components: Library (Lib)
messages: 334894
nosy: rhettinger, steven.daprano, tim.peters
priority: normal
severity: normal
status: open
title: Add statistics.fmean(seq)
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35904>
_______________________________________