[SciPy-Dev] scipy.stats documentation

nicky van foreest vanforeest at gmail.com
Mon May 7 07:51:28 EDT 2012


Hi,

I am still struggling to understand some of the scipy stats package,
and ran into some obscure points.

1)

What is actually the shape parameter?  Let me include some references
to show my confusion here.

In expon it does not seem to exist:

https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L2770

Then, in Erlang it is called 'n'. I suppose this would mean the number
of stages. So in Erlang, why then is the scale parameter corresponding
to the shape? BTW: should the scale in the erlang dist dosctring not
be explained?

Then, from the gamma dist I learn the following:

https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L3382

So that would mean that in the expon dist the shape is set to 1.

Then, here:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.erlang.html

it states that the shape parameter should be an int, but in the
examples section it is set to 0.9, i.e., the documentation states
this:

>>> from scipy.stats import erlang
>>> numargs = erlang.numargs
>>> [ n ] = [0.9,] * numargs
>>> rv = erlang(n)

from which I infer that the shape is set to 0.9.

All in all, I don't quite know what to expect with regard to the use
and purpose of shape.

Is the shape parameter explained somewhere explicitly? if not,
wouldn't the stats tutorial be the best place? Who is the author of
this doc? How can I help change it?

2)

Would it be a good idea to make the use of the loc and scale parameter
explicit in the doc strings of the distributions? I recall that, as a
first time user, I had no clue what they meant, and that it took some
struggling and searching to figure out what they came down to.
Besides, the doc strings are not allways complete. For instance, this
is the string for the epx distribution:

The probability density function for `expon` is::

        expon.pdf(x) = exp(-x)

    for ``x >= 0``.

    The scale parameter is equal to ``scale = 1.0 / lambda``.

So, what is lambda here? Is it: pdf(x) = lambda * exp(-x lambda), or
is it pdf(x) = exp(-x/lambda)/lamda? After some experimentation I
found out, but the documentation is not explicit enough in my opinion.
Suppose we would restate it like this:

cdf(x) = 1. - exp( -(x-loc)/scale).

Then I think it would be clear immediately, and also
interpretation-free. Likewise for other distributions.

3)
I am really willing to help improve stats and the documentation at
points more consistent, but I don't quite know where to start.  In the
process I raise all these points. Is this list the best place, or
should I send my comments to Josef (?)?

Nicky



More information about the SciPy-Dev mailing list