[SciPy-Dev] stats.distributions.py documentation

Sun Oct 7 14:33:41 EDT 2012

HI Ralf,

Sorry for being so slow at getting back to your comments. I have
definitely not forgotten this mail. However, for the next few weeks I
have to a considerable amount of teaching... Once my workload is a bit
lower, I 'll come up with a plan.

Nicky

On 21 September 2012 21:27, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
> On Sun, Sep 16, 2012 at 11:10 PM, nicky van foreest <vanforeest at gmail.com>
> wrote:
>>
>> Hi,
>>
>> Below are two proposals to handle the documentation of the scipy
>> distributions.
>>
>> The first is to add a set of examples to each distribution, see the
>> list at the end of the mail as an example. However, I actually wonder
>> whether it wouldn't be better to put this stuff in the stats tutorial.
>> (I recently updated this, but given the list below, it is still not
>> complete.) The list below is a bit long... too long perhaps.
>>
>> I actualy get the feeling that, given the enormous diversity of the
>> distributions, it may not be possible to automatically generate a set
>> of simple examples that work for each and every distributions. Such
>> examples then would involve the usage of x.dist.b, and so on, and this
>> is not particularly revealing to first (and second) time users.
>
>
> This is exactly what the problem is currently.
>
>>
>> A possible resolution is to include just one or two generic examples
>> in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer
>> to the tutorial for the rest. The tutorial then should show extensive
>> examples for each method of the norm distribution. I assume that then
>> any user of other distributions can figure out how to proceed for
>> his/her own distribution.
>
>
> This is a huge amount of work, and the generic example still won't run if
> you copy-paste it into a terminal.
>
>>
>>
>> The second possibility would be to follow Josef's suggestion:
>> --snip snip
>> Splitting up the distributions pdf docs in tutorial into separate
>> pages for individual distributions, make them nicer with code and
>> graphs and link them from the docstring of the distribution.
>
>
> Linking to the tutorial from the docstrings is a good idea, but the
> docstrings themselves should be enough to get started.
>>
>>
>> This would keep the docstring itself from blowing up, but we could get
>> the full html reference if we need to.
>>
>> --snip snip
>>
>> This idea offers a lot of opportunities. In a previous mail I
>> mentioned that I don't quite like that the documentation is spread
>> over multiple documents. There are doc strings in distributions.py
>> (leading to a bloated file),
>
>
> It's not that bad imho. The typical docstring looks like:
> """A beta prima continuous random variable.
>
>     %(before_notes)s
>
>     Notes
>     -----
>     The probability density function for `betaprime` is::
>
>         betaprime.pdf(x, a, b) =
>             gamma(a+b) / (gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(-a-b)
>
>     for ``x > 0``, ``a > 0``, ``b > 0``.
>
>     %(example)s
> """
>
> It can't be much shorter than that.
>
>
>> and there is continuous.rst. Part of the
>> implementation can be understood from the doc-string, typically, the
>> density function, but not the rest;
>
>
> The pdf and support are given, that's enough to define the distribution. So
> that should stay. It doesn't mean we have to copy the whole wikipedia page
> for each distribution.
>
>>
>> this requires continuous.rst.
>> Besides this, in case some specific distribution requires extra
>> explanation/examples, this will have to put in the doc-string, making
>> distributions.py longer still. Thus, to take up Josef's suggestion,
>> what about a documentation file organised like this:
>
>
> Are you suggesting a reST page here, or a .py file with only docs, and new
> magic to make part of the content show up as docstring? The former sounds
> better to me.
>
>>
>>
>> # some tag to tell that these are the docs for the norm distribution
>> # eg.
>> # norm_gen
>>
>> Normal Distribution
>> ----------------------------
>>
>> Notes
>> ^^^^^^^
>> # should be used by the interpreter
>> The probability density function for `norm` is::
>>
>>        norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
>>
>> Simple Examples
>> ^^^^^^^^^^^^^^^^^^^^
>> # used for by interpreter
>>      >>> norm.rvs( size = (2,3) )
>>
>> Extensive Examples
>> ^^^^^^^^^^^^^^^^^^^^^^^^
>> # Not used by the interpreter, but certainly by a html viewer,
>> containing graphs, hard/specific examples.
>>
>> Mathematical Details
>> ^^^^^^^^^^^^^^^^^^^^^^
>>
>> Stuff from continuous.rst
>>
>> # dist2_gen
>> Distribution number 2
>> -----------------------------------------
>> etc
>>
>> It shouldn't be too hard to parse such a document, and couple each
>> piece of documentation to a distribution in distributions.py (or am I
>> mistaken?) as we use the class name as the tag in the documentation
>> file. The doc-string for a distribution in distributions.py can then
>> be removed,
>>
>> Nicky
>>
>> Example for the examples section of the docstring of norm.
>
>
> This example is good. Perhaps the frozen distribution needs a few words of
> explanation. I suggest to do a few more of these for common distributions,
> and link to the norm() docstring from less common distributions. Other than
> that, I wouldn't change anything about the docstrings. Built docs could be
> reworked more thoroughly.
>
> Ralf
>
>
>>
>>
>>     Notes
>>     -----
>>     The probability density function for `norm` is::
>>
>>         norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
>>
>>     #%(example)s
>>
>>     Examples
>>     --------
>>
>>     Setting the mean and standard deviation:
>>
>>         >>> from scipy.stats import norm
>>         >>> norm.cdf(0.0)
>>         >>> norm.cdf(0., 1) # set mu = loc = 1
>>         >>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2
>>         >>> norm.cdf(0., loc = 1,  scale = 2) # mu = loc = 1, scale =
>> sigma = 2
>>
>>     Frozen rvs
>>
>>         >>> norm(1., 2.).cdf(0)
>>         >>> x = norm(scale = 2.)
>>         >>> x.cdf(0.0)
>>
>>     Moments
>>
>>         >>> norm(loc = 2).stats()
>>         >>> norm.mean()
>>         >>> norm.moment(2, scale = 3.)
>>         >>> x.std()
>>         >>> x.var()
>>
>>     Random number generation
>>
>>         >>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of
>> shape (2,3)
>>         >>> norm.rvs(3, 1, size = [2,3])
>>         >>> x.rvs(3)     # array with 3 random deviates
>>         >>> x.rvs([3,4]) # array of shape (3,4) with deviates
>>
>>     Expectations
>>
>>         >>> norm.expect(lambda x: x, loc = 1) # 1.00000
>>         >>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second
>> moment
>>
>>     Support of the distribution
>>
>>         >>> norm.a # left limit, -np.inf here
>>         >>> norm.b # right limit, np.inf here
>>
>>     Plot of the cdf
>>
>>         >>> import numpy as np
>>         >>> x = np.linspace(0, 3)
>>         >>> P = norm.cdf(x)
>>         >>> plt.plot(x,P)
>>         >>> plt.show()
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>