[SciPy-User] log pdf, cdf, etc

josef.pktd at gmail.com josef.pktd at gmail.com
Sat May 29 17:20:25 EDT 2010


On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:
>
>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>> <christophermarkstrickland at gmail.com> wrote:
>>> Hi,
>>>
>>> When using any of the distributions of scipy.stats there does not seem to be
>>> the ability (or at least I cannot figure out how) to have the function
>>> return
>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>> essential.
>>> For instance suppose we are interested in an exponential distribution for a
>>> random variable x with a hyperparameter lambda there needs to be an option
>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>
>>> Is there a way to do this using the distributions in scipy.stats?
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify much.
>>
>> I proposed this once together with other improvements (but without response).
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>
> Hey Josef,
>
> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).
>
> I also added your _fitstart suggestion.   I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.

>
> Do you have updated code I could look at.   These are relatively easy adds that I would like to put in today.     Do you have check-in rights to SciPy?

I just committed the changes for the _logpdf, ..., I didn't see any
changes of yours in the timeline, nor in svn changes, plus a fix to
internal wrapcauchy_cdf

generic _logpdf, logcdf and the 13 cases of my test script are in svn

Josef



Josef

>
> Thanks,
>
> -Travis
>
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
>>
>>>
>>> If there is not is it possible for me to suggest that this feature is added.
>>> There is such an excellent range of distributions, each with such an
>>> impressive range of options, it seems ashame to have to mostly manually code
>>> up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>>>
>>> Thanks,
>>> Chris.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> ---
> Travis Oliphant
> Enthought, Inc.
> oliphant at enthought.com
> 1-512-536-1057
> http://www.enthought.com
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list