[issue36546] Add quantiles() to the statistics module

Raymond Hettinger report at bugs.python.org
Mon Apr 8 03:42:33 EDT 2019


Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

Thanks for taking a detailed look.  I'll explore the links you provided shortly.

The API is designed to be extendable so that we don't get trapped by the choice of computation method.  If needed, any or all of the following extensions are possible without breaking backward compatibility:

  quantiles(data, n=4, already_sorted=True) # Skip resorting
  quantiles(data, cut_points=[0.02, 0.25, 0.50, 0.75, 0.98]) # box-and-whiskers
  quantiles(data, interp_method='nearest') # also: "low", "high", "midpoint"
  quantiles(data, inclusive=True)    # For description of a complete population

The default approach used in the PR matches what is used by MS Excel's PERCENTILE.EXC function¹.  That has several virtues. It is easy to explain.  It allows two unequal sized datasets to be compared (perhaps with a QQ plot) to explore whether they are drawn from the same distribution.  For sampled data, the quantiles tend to remain stable as more samples are added.  For samples from a known distribution (i.e normal variates), it tends to give the same results as ihv_cdf():

    >>> iq = NormalDist(100, 15)
    >>> cohort = iq.samples(10_000)
    >>> for ref, est in zip(quantiles(iq, n=10), quantiles(cohort, n=10)):
    ...     print(f'{ref:5.1f}\t{est:5.1f}')
    ...
     80.8	 81.0
     87.4	 87.8
     92.1	 92.3
     96.2	 96.3
    100.0	100.1
    103.8	104.0
    107.9	108.0
    112.6	112.9
    119.2	119.3

My thought was to start with something like this and only add options if they get requested (the most likely request is an inclusive=True option to emulate MS Excel's PERCENTILE.INC).  

If we need to leave the exact method unguaranteed, that's fine.  But I think it would be better to guarantee the match to PERCENTILE.EXC and then handle other requests through API extensions rather than revisions.


¹ https://exceljet.net/excel-functions/excel-percentile.exc-function

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36546>
_______________________________________


More information about the Python-bugs-list mailing list