[SciPy-User] log pdf, cdf, etc

josef.pktd at gmail.com josef.pktd at gmail.com
Fri May 28 10:15:55 EDT 2010


On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
<christophermarkstrickland at gmail.com> wrote:
> Hi,
>
> When using any of the distributions of scipy.stats there does not seem to be
> the ability (or at least I cannot figure out how) to have the function
> return
> the log of the pdf, cdf, sf, etc. For statistical analysis this is
> essential.
> For instance suppose we are interested in an exponential distribution for a
> random variable x with a hyperparameter lambda there needs to be an option
> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
> calculate log(scipy.stats.expon.pdf(x,lambda)).
>
> Is there a way to do this using the distributions in scipy.stats?

It would need a new method for each distribution, e.g. _loglike, _logpdf
So, this is work, and for some distributions the log wouldn't simplify much.

I proposed this once together with other improvements (but without response).

The second useful method for estimation would be _fitstart, which
provides distribution specific starting values for fit, e.g. a moment
estimator, or a simple rules of thumb
http://projects.scipy.org/scipy/ticket/808


Here are some of my currently planned enhancements to the distributions:

http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py

but I just checked, it looks like I forgot to copy the _loglike method
that I started from my experimental scripts.

For a few distributions, where this is possible, it would also be
useful to add the gradient with respect to the parameters, (or even
the Hessian). But this is currently mostly just an idea, since we need
some analytical gradients in the estimation of stats models.


>
> If there is not is it possible for me to suggest that this feature is added.
> There is such an excellent range of distributions, each with such an
> impressive range of options, it seems ashame to have to mostly manually code
> up the log of pdfs and often call the log of CDFs from R.

So far I only thought about log pdf, because I wanted it for Maximum
Likelihood estimation.

Do you have a rough idea for which distributions log cdf would work?
that is, for which distribution is an analytical or efficient
numerical expression possible.

I also think that scipy.stats.distributions could be one of the best
(broadest, consistent) collection of univariate distributions that I
have seen so far, once we fill in some missing pieces.

As a way forward, I think we could make the distributions into a
numerical encyclopedia by adding private methods to those
distributions where it makes sense, like log pdf, log cdf and I also
started to add characteristic functions to some distributions in my
experimental scripts.
If you have a collection of logpdf, logcdf, we could add a trac ticket for this.

However, this would miss the generic broadcasting part of the public
functions, pdf, cdf,... but for estimation I wouldn't necessarily call
those because of the overhead.


I'm working on and off on this, so it's moving only slowly (and my
wishlist is big).
(for example, I was reading up on extreme value distributions in
actuarial science and hydrology to get a better overview over the
estimators.)


So, I really love to hear any ideas, feedback, and see contributions
to improving the distributions.

Josef


>
> Thanks,
> Chris.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>



More information about the SciPy-User mailing list