[SciPy-Dev] scipy.stats documentation

josef.pktd at gmail.com josef.pktd at gmail.com
Mon May 7 15:41:37 EDT 2012


On Mon, May 7, 2012 at 2:20 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi,
>
> Many thanks for your explanations. I think it is best to just respond
> to the point below that are not yet completely clear to me. I'll
> remove the rest.
>
>> All continuous distributions have a generic treatment of loc and scale
>> independent of whether this is part of a standard definition of the
>> distribution. (discrete distributions only have a loc shift.)
>> (location-scale families of distributions)
>>
>> dist.cdf((x-loc)/scale) = dist.cdf(x, loc=loc, scale=scale)
>> the standard distributions have loc=0, scale=1
>> pdf and other methods follow from this
>
> I know this, but it took me some searching to find this out (already a
> few years ago.) I would like to make this somewhat easier to
> understand for first time users. Besides this, if I read (for the
> exponential)
>
> cdf(x) = 1-exp( -(x-loc)/scale)
>
> it is perfectly clear that 1/scale represents the rate.
>
> The confusing point is that text books, wikipedia, and so on, are
> inconsistent. Hence, before I use any distribution in scipy.stats I
> first run a few test to be sure that using scale works the way as I
> expect. If, on the other hand, the docstring would be completely
> explicit, including the use of loc and scale (even though it is
> overkill as loc and scale are explained somewhere else), no confusions
> should arise, at least not from my part.
>
> As an example, I think as a first time user the following doc string
> would have helped me the most (Hopefully I am not too biased here):
>
>    """An exponential continuous random variable.
>
>    %(before_notes)s
>
>    Notes
>    -----
>    The cumulative distribution function for `expon` is::
>
>        expon.cdf(x) = 1. - exp(-(x-loc)/scale)
>
>    To compute
>
>          cdf(x) = exp(-lambda x )
>
>     it is required to take ``scale = 1.0/lambda``; since ``loc = 0``
> automatically it            is not necessary to set ``loc = 0.``
> explicitly.
>
>    The shape parameter is not implemented for ``expon`` as ``loc``
> and ``scale`` suffice.
>
>    %(example)s
>
>    """

Yes that looks good to me.
a few problems:
in most cases we have the pdf currently in the docs which is, I guess,
more familiar to most users.
I'm not sure having 1./scale in front and (x-loc)/scale inside makes
the pdf easier to read or understand, but it's more explicit.
For many distributions, we don't have an explicit expression for the
cdf, so loc and scale should still be understandable from the general
documentation


>
>> Sometimes the parameterization in stats.distributions is "a bit
>> difficult" to translate to the standard parameterization, example
>> lognormal that regularly raises questions.
>
> Ok. Can we resolve this by making the doc-string more explicit, like
> my example above?

Yes

>
>> these are generic template numbers and could be replaced by
>> distribution specific docstrings
>
> Ok. How about fixing part for part?

If someone is going through individual distributions, this would be
very good. (My initial worry a few years ago was that it will be
difficult to maintain 90 individual docstrings.)

>
>> online editing is the easiest.
>
> Many of these points seem, at least to me, too minor to raise a
> ticket, or am I mistaken here?

no individual tickets are necessary. One possibility is a pull request
with many changes.
I usually prefer the online doc system. If you have edit permission,
otherwise sign up and ping the list.

here is the tutorial for editing
http://docs.scipy.org/scipy/docs/scipy-docs/tutorial/stats.rst/

the distribution docstrings are a bit trickier:
don't edit the generated docstring of the instance, e.g.
http://docs.scipy.org/scipy/docs/scipy.stats.expon/edit/
I think that would create a mess

The docstring of the class with template is here
http://docs.scipy.org/scipy/docs/scipy.stats.distributions.expon_gen/edit/
the only way I found the link is going through the milestones and look
for  xxx_gen
http://docs.scipy.org/scipy/Milestones/Milestones_11/


>
>> I wrote the stats tutorial a long time ago, and it contains the
>> description of individual distributions written by Travis.
>> I haven't looked at the overall documentation for the distributions in a while.
>
> Where can I find the tutorial? Then I'll try to add some description
> about the shape parameter, and improve/add parts where necessary.

http://docs.scipy.org/scipy/docs/scipy-docs/tutorial/stats.rst/
or in the source
https://github.com/scipy/scipy/blob/master/doc/source/tutorial/stats.rst

Josef
>
>>
>> Suggestions, or, even better, direct improvements in the doc editor or
>> with pull request would be very welcome.
>
> I'll try that.
>
>> Nicky, thanks for looking into this.
>
> I am happy to be able to do something in return. python, scipy, and
> stats, made my life (in some respects :-) much easier.
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev



More information about the SciPy-Dev mailing list