[Speed] Median +- MAD or Mean +- std dev?

Tue Mar 14 10:42:07 EDT 2017

On 14 March 2017 at 17:14, Serhiy Storchaka <storchaka at gmail.com> wrote:

> On 13.03.17 22:38, Antoine Pitrou wrote:
>
>> Additionally, while mean and std dev are generally quite well
>> understood, the properties of the median absolute deviation are
>> generally little known.
>>
>
> Std dev is well understood for the distribution close to normal. But when
> the distribution is too skewed or multimodal (as in your quick example)
> common assumptions (that 2/3 of samples are in the range of the std dev,
> 95% of samples are in the range of two std devs, 99% of samples are in the
> range of three std devs) are no longer valid.

That would suggest that the implicit assumption of a measure-of-centrality
with a measure-of-symmetric-deviation may need to be challenged, as at
least some meaningful performance problems are going to show up as
non-normal distributions in the benchmark results.

Network services typically get around the "inherent variance" problem by
looking at a few key percentiles like 50%, 90% and 95%. Perhaps that would
be appropriate here as well?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20170315/c4c7e628/attachment.html>