[SciPy-User] log pdf, cdf, etc

Chris Strickland christophermarkstrickland at gmail.com
Fri May 28 21:03:40 EDT 2010


On Sat, May 29, 2010 at 12:15 AM, <josef.pktd at gmail.com> wrote:

>
> It would need a new method for each distribution, e.g. _loglike, _logpdf
> So, this is work, and for some distributions the log wouldn't simplify
> much.
>
> I am not sure what you mean the log wouldn't simply much.


> I proposed this once together with other improvements (but without
> response).
>
> This is a little disappointing, it significantly reduces how useful the
library is. In actual fact I have not been able to use a single function for
anything other than testing (although, I have been using numpy.random for
random numbers, this scipy.stats collection seems far more complete). This
would dramatically change if a log version of the distribution were
available. I think in most cases this would be a straightforward addition at
least for the pdf.



> The second useful method for estimation would be _fitstart, which
> provides distribution specific starting values for fit, e.g. a moment
> estimator, or a simple rules of thumb
> http://projects.scipy.org/scipy/ticket/808
>
>
> Here are some of my currently planned enhancements to the distributions:
>
>
> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py<http://bazaar.launchpad.net/%7Escipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py>
>
> but I just checked, it looks like I forgot to copy the _loglike method
> that I started from my experimental scripts.
>
> For a few distributions, where this is possible, it would also be
> useful to add the gradient with respect to the parameters, (or even
> the Hessian). But this is currently mostly just an idea, since we need
> some analytical gradients in the estimation of stats models.
>
> This certainly would be nice as well.

>
> >
> > If there is not is it possible for me to suggest that this feature is
> added.
> > There is such an excellent range of distributions, each with such an
> > impressive range of options, it seems ashame to have to mostly manually
> code
> > up the log of pdfs and often call the log of CDFs from R.
>
> So far I only thought about log pdf, because I wanted it for Maximum
> Likelihood estimation.
>
> It is also necessary for MCMC.


> Do you have a rough idea for which distributions log cdf would work?
> that is, for which distribution is an analytical or efficient
> numerical expression possible.
>

Not sure off the top of my head as I mainly require the only the pdf. I was,
however, doing a little survival analysis the other day though and it was
required. The log of the survival and hazard functions would be nice also.
So far I have only required the exponential (analytical), weibull
(analytical), normal (numerical) and powernormal (analytical function of the
log of the normal cdf). I just had a peak at the R source code for pnorm
(R's code for the normal cdf). The function is not big and also licensed
under the GNU public licence. I assume it could be fairly easily ported to
scipy.

>
> I also think that scipy.stats.distributions could be one of the best
> (broadest, consistent) collection of univariate distributions that I
> have seen so far, once we fill in some missing pieces.
>
> As a way forward, I think we could make the distributions into a
> numerical encyclopedia by adding private methods to those
> distributions where it makes sense, like log pdf, log cdf and I also
> started to add characteristic functions to some distributions in my
> experimental scripts.
> If you have a collection of logpdf, logcdf, we could add a trac ticket for
> this.
>

I could fairly easy whip up a collection of functions to compute the logpdf
for a large number of distributions. Not sure about the CDFs but I can look
into it as well. The pdf's are definitely far more urgent for my own work. I
am a bit busy at work though for the next three weeks so it would have to be
after that.

>
> However, this would miss the generic broadcasting part of the public
> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
> those because of the overhead.
>
>
> I'm working on and off on this, so it's moving only slowly (and my
> wishlist is big).
> (for example, I was reading up on extreme value distributions in
> actuarial science and hydrology to get a better overview over the
> estimators.)
>
>
> So, I really love to hear any ideas, feedback, and see contributions
> to improving the distributions.
>
> Josef
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100529/8a89af12/attachment.html>


More information about the SciPy-User mailing list