[SciPy-User] log pdf, cdf, etc

josef.pktd at gmail.com josef.pktd at gmail.com
Sat May 29 17:38:46 EDT 2010


On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:
>
>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>> <christophermarkstrickland at gmail.com> wrote:
>>> Hi,
>>>
>>> When using any of the distributions of scipy.stats there does not seem to be
>>> the ability (or at least I cannot figure out how) to have the function
>>> return
>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>> essential.
>>> For instance suppose we are interested in an exponential distribution for a
>>> random variable x with a hyperparameter lambda there needs to be an option
>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>
>>> Is there a way to do this using the distributions in scipy.stats?
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify much.
>>
>> I proposed this once together with other improvements (but without response).
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>
> Hey Josef,
>
> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).

I would like to get the private _logpdf in a useful (vectorized or
broadcastable) version because for estimation and optimization, I want
to avoid the logpdf overhead. So, my testing will be on the underline
versions.

>
> I also added your _fitstart suggestion.   I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.

I have written a semi-frozen fit function and posted to the mailing
list a long time ago, but since I'm not sure about the API and I'm
expanding to several new estimators, I kept this under
work-in-progress.

Similar _fitstart might need extra options, for estimation when some
parameters are fixed, e.g. there are good moment estimators that work
when some of the parameters (e.g. loc or scale) are fixed. Also
_fitstart is currently used only by my fit_frozen.

I was hoping to get this done this year, maybe together with the
enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
Maximum Spacings.

I also have a Generalized Method of Moments estimator based on
matching quantiles and moments in the works.

So, I don't want yet to be pinned down with any API for the estimation
enhancements.

Josef

>
> Do you have updated code I could look at.   These are relatively easy adds that I would like to put in today.     Do you have check-in rights to SciPy?
>
> Thanks,
>
> -Travis
>
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
>>
>>>
>>> If there is not is it possible for me to suggest that this feature is added.
>>> There is such an excellent range of distributions, each with such an
>>> impressive range of options, it seems ashame to have to mostly manually code
>>> up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>>>
>>> Thanks,
>>> Chris.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> ---
> Travis Oliphant
> Enthought, Inc.
> oliphant at enthought.com
> 1-512-536-1057
> http://www.enthought.com
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list