[SciPy-Dev] References for weibull_min and weibull_max distributions

Tue Jan 15 16:46:44 EST 2019

Hi Pierre,

My answers to your questions are below...

On 1/14/19, Pierre Haessig <pierre.haessig at crans.org> wrote:
> Hello,
>
> I just submitted a small PR to clarify the docstring of exponweib
> distribution in scipy.stats (https://github.com/scipy/scipy/pull/9679).
>
> However, in the process, I got a bit confused with weibull_min and
> weibull_max. It seems that up to Scipy 0.19, it was specified as an
> alias to Frechet left distribution
> (https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.weibull_min.html).
> However this is not mentioned anymore since 1.0.

For a long time, weibull_min and weibull_max were aliases of frechet_l
and frechet_r.  The problem was that the implementations in frechet_l
and frechet_r were not what anyone calls the Fréchet distribution
these days.  They were, in fact, what are almost universally called
the Weibull distribution.  So in SciPy 1.0.0, those implementations
were moved to the weibull_min and weibull_max names, and the names
frechet_r and frechet_l were deprecated (see
https://github.com/scipy/scipy/pull/7838 for the details of the
changes).  The names frechet_l and frechet_r still exist, but if you
use any of their methods, you will get a deprecation warning.
(Unfortunately, it looks like I neglected make a note of this
deprecation in the 1.0.0 release notes.)

The distribution that is generally known as *the* Fréchet distribution
is also known as the inverse Weibull distribution (see, for example,
https://en.wikipedia.org/wiki/Fr%C3%A9chet_distribution).  It is
implemented in SciPy as scipy.stats.invweibull.

> Also, it seems that
> weibull_min corresponds to the usual Weibull distribution, but its
> docstring doesn't say it explicitly. Also, I find no references on the
> web for those Weibull min/max. Would it be appropriate, in the long
> term, to simply have a Weibull distribution?

The extreme value distributions arise as the limiting distribution of
taking the exteme value (i.e. maximum or minimum) of a large number of
samples from some underlying distribution.  For a certain class of
underlying distributions, if you take the maximum, in the
(appropriately renormalized) limit you get the distribution that SciPy
calls weibull_max, and if you take the minimum, you get weibull_min.
(These distributions are related:  if F(x, c) is the CDF of
weibull_min with shape parameter c, then the CDF of weibull_max is 1 -
F(-x, c).)

The issue, then, is which one should be considered the "usual" Weibull
distribution?  The answer is not obvious.  For example, the
distribution described in the wikipedia article on the Weibull
distribution (https://en.wikipedia.org/wiki/Weibull_distribution)
corresponds to weibull_min.  This is also the distribution from which
numpy.random.weibull draws samples.  On the other hand, in the book
"An Introduction to Statistical Modeling of Extreme Values" by Stuart
Coles, and in the book "Modelling Extremal Events" by Embrechts,
Klüppelberg and Mikosch (two widely used texts on extreme value
theory), the distribution that is called the Weibull distribution
corresponds to SciPy's weibull_max.

So I think we are better off *not* picking one to be called the
"usual" Weibull distribution.  The current names accurately describe
the basis of the two flavors of the distribution.  However, we should
improve the documentation to include this information about the
min/max distinction in their docstrings.  We should do the same for
gumbel_l and gumbel_r.  I'd be happy to make this change, but I
probably won't get to it in the near future, so I'd be even happier if
someone created a pull request that added this information to the
docstrings of weibull_min and weibull_max.  Similar updates for
gumbel_l and gumber_r could be made at the same time or in a separate
pull request.

(The original implementations of these extreme value distributions
dates back to before my involvement with SciPy, so I can't say why the
Weibull distribution used the suffixes _min and _max while the other
distributions with two conventions used _l and _r, and I don't know
why we don't have the two versions for the inverse Weibull--a.k.a.
Fréchet-- distribution.)

Warren