[SciPy-User] log pdf, cdf, etc

josef.pktd at gmail.com josef.pktd at gmail.com
Sat May 29 17:58:31 EDT 2010


On Sat, May 29, 2010 at 5:38 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>>
>> On May 28, 2010, at 9:15 AM, josef.pktd at gmail.com wrote:
>>
>>> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
>>> <christophermarkstrickland at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> When using any of the distributions of scipy.stats there does not seem to be
>>>> the ability (or at least I cannot figure out how) to have the function
>>>> return
>>>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>>>> essential.
>>>> For instance suppose we are interested in an exponential distribution for a
>>>> random variable x with a hyperparameter lambda there needs to be an option
>>>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>>>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>>>
>>>> Is there a way to do this using the distributions in scipy.stats?
>>>
>>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>>> So, this is work, and for some distributions the log wouldn't simplify much.
>>>
>>> I proposed this once together with other improvements (but without response).
>>>
>>> The second useful method for estimation would be _fitstart, which
>>> provides distribution specific starting values for fit, e.g. a moment
>>> estimator, or a simple rules of thumb
>>> http://projects.scipy.org/scipy/ticket/808
>>>
>>>
>>> Here are some of my currently planned enhancements to the distributions:
>>>
>>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>>
>> Hey Josef,
>>
>> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).
>
> I would like to get the private _logpdf in a useful (vectorized or
> broadcastable) version because for estimation and optimization, I want
> to avoid the logpdf overhead. So, my testing will be on the underline
> versions.
>
>>
>> I also added your _fitstart suggestion.   I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.
>
> I have written a semi-frozen fit function and posted to the mailing
> list a long time ago, but since I'm not sure about the API and I'm
> expanding to several new estimators, I kept this under
> work-in-progress.
>
> Similar _fitstart might need extra options, for estimation when some
> parameters are fixed, e.g. there are good moment estimators that work
> when some of the parameters (e.g. loc or scale) are fixed. Also
> _fitstart is currently used only by my fit_frozen.
>
> I was hoping to get this done this year, maybe together with the
> enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
> Maximum Spacings.
>
> I also have a Generalized Method of Moments estimator based on
> matching quantiles and moments in the works.
>
> So, I don't want yet to be pinned down with any API for the estimation
> enhancements.
>
> Josef
>
>>
>> Do you have updated code I could look at.   These are relatively easy adds that I would like to put in today.     Do you have check-in rights to SciPy?

http://projects.scipy.org/scipy/log/trunk/scipy/stats/distributions.py

>>
>> Thanks,
>>
>> -Travis
>>
>>>
>>> but I just checked, it looks like I forgot to copy the _loglike method
>>> that I started from my experimental scripts.
>>>
>>> For a few distributions, where this is possible, it would also be
>>> useful to add the gradient with respect to the parameters, (or even
>>> the Hessian). But this is currently mostly just an idea, since we need
>>> some analytical gradients in the estimation of stats models.
>>>
>>>
>>>>
>>>> If there is not is it possible for me to suggest that this feature is added.
>>>> There is such an excellent range of distributions, each with such an
>>>> impressive range of options, it seems ashame to have to mostly manually code
>>>> up the log of pdfs and often call the log of CDFs from R.
>>>
>>> So far I only thought about log pdf, because I wanted it for Maximum
>>> Likelihood estimation.
>>>
>>> Do you have a rough idea for which distributions log cdf would work?
>>> that is, for which distribution is an analytical or efficient
>>> numerical expression possible.
>>>
>>> I also think that scipy.stats.distributions could be one of the best
>>> (broadest, consistent) collection of univariate distributions that I
>>> have seen so far, once we fill in some missing pieces.
>>>
>>> As a way forward, I think we could make the distributions into a
>>> numerical encyclopedia by adding private methods to those
>>> distributions where it makes sense, like log pdf, log cdf and I also
>>> started to add characteristic functions to some distributions in my
>>> experimental scripts.
>>> If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
>>>
>>> However, this would miss the generic broadcasting part of the public
>>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>>> those because of the overhead.
>>>
>>>
>>> I'm working on and off on this, so it's moving only slowly (and my
>>> wishlist is big).
>>> (for example, I was reading up on extreme value distributions in
>>> actuarial science and hydrology to get a better overview over the
>>> estimators.)
>>>
>>>
>>> So, I really love to hear any ideas, feedback, and see contributions
>>> to improving the distributions.
>>>
>>> Josef
>>>
>>>
>>>>
>>>> Thanks,
>>>> Chris.
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> ---
>> Travis Oliphant
>> Enthought, Inc.
>> oliphant at enthought.com
>> 1-512-536-1057
>> http://www.enthought.com
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>



More information about the SciPy-User mailing list