[SciPy-Dev] Tweedie distributions in scipy.stats

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Mar 13 12:04:11 EDT 2020


Aside:
compound Poisson is a convolution of distributions and not a finite mixture.
Allowing an infinite mixing distribution like Poisson creates numerical
problems in the upper tail that are not easy to solve in general.
In most cases, computation have to be truncated at the upper tail, but then
the problem is to figure out the truncation threshold for a required
precision.
My guess is that this would be a lot of work to get it to scipy standards.

I was looking at the general case for convolution and compound poisson a
long time ago, mainly using fft to get the pdf and cdf from the
characteristic function,, the cf is relatively simple to compute for
convolutions. The references in extreme value and risk applications that I
looked at, was emphasizing tail precision and ways how to work around it,
or comparing different methods in how precise they are.
fft was fast, but I only eyeballed the truncation threshold for my examples.

Josef


On Fri, Mar 13, 2020 at 11:46 AM <josef.pktd at gmail.com> wrote:

>
>
> On Fri, Mar 13, 2020 at 11:16 AM Christian Lorentzen <
> lorentzen.ch at gmail.com> wrote:
>
>> Thank you for your feedback.
>>
>> If it were possible to mix distributions together, e.g. rv1 + rv2, the
>> compound poisson could be represented. AFAIK, PyMC3 supports that with
>> "Mixture" distributions [1]. So I see three options:
>>
>>    1. Implement only a log likelihood function as Josef suggests. In
>>    which package to put it? Scipy, Statsmodels, PyMC3?
>>    2. Ask PyMC3 developers for this distribution.
>>    3. Ask Statsmodel developers for this distribution.
>>    @Josef: I hereby ask you;-)
>>
>> The tricky part: As I intend to calculate the log likelihood via Wrights
>> generalized Bessel function and PR [2] implements this as a private special
>> function, can this function stay in scipy or should it move to the other
>> packages in that case.
>>
>> [1] https://docs.pymc.io/api/distributions/mixture.html
>> [2] https://github.com/scipy/scipy/pull/11313
>>
>
> IMO, scipy would be a good central location for logpdf and associated
> wright.
> All packages like sklearn and statsmodels depend on scipy but not on each
> other.
>
> If wright in special works out, then adding just logpdf would be a good
> short term solution.
> Adding a new base distribution class e.g. for distribution mixtures will
> be a lot more work and won't happen fast.
>
> Because statsmodels has currently only an approximate tweedie logpdf, we
> would add any improved version if it doesn't go into scipy.
> We can add it also to compat.scipy until it is in all scipy versions that
> we support.
>
>
> scipy.stats has now multivariate distributions that don't fit into the old
> univariate distribution setup.
> Similarly, I think it would be possible to add other distributions that
> don't fit into the existing class hierarchy.
> That would be an intermediate step without adding full support for generic
> mixture distributions, or discrete-continuous combinations.
>
> Josef
>
>
> Josef
>
>>
>>
>> Kind regards
>> Christian
>> On 11.03.20 22:41, josef.pktd at gmail.com wrote:
>>
>>
>>
>> On Wed, Mar 11, 2020 at 5:02 PM Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen <
>>> lorentzen.ch at gmail.com> wrote:
>>>
>>>> Dear Scipy Developers and mailing list Readers
>>>>
>>>> I'd like to address the issue [1] to implement Tweedie distributions
>>>> [2] in scipy.stats.
>>>>
>>>> Purpose
>>>> The family of Tweedie distributions contains many known distributions
>>>> like the Poisson and the Gamma distribution, but also distributions between
>>>> them, aka compound poisson gamma distribution, see [3]. These are often
>>>> appropriate for insurance claims and other fields, where one has a
>>>> (Poisson) random count process of events and every event has a (Gamma)
>>>> random size/amount.
>>>> The distribution would enable simulations, maximum likelihood
>>>> estimation of all parameters, choice and visualization of distributions,
>>>> etc.
>>>>
>>>> Implementation
>>>> I started PR [4] for Wrights generalized Bessel functions as a private
>>>> function in scipy.special.
>>>> Once this is ready, the pdf follows immediately.
>>>> For the range of interest of Y ~ compound poisson gamma distribution,
>>>> the distribution of Y has a point mass at zero and is otherwise continuous
>>>> for Y>0.
>>>> As already discussed in the issue [1], Tweedie might best fit as
>>>> rv_generic.
>>>> As such, it would be the first one, all others are either rv_discrete
>>>> or rv_continuous.
>>>> Without a template, I would need guidance how to implement a new
>>>> rv_generic.
>>>>
>>> FWIW, `rv_generic` isn't really intended to be a concrete class. It was
>>> only intended to be a base class implementing the common parts needed by
>>> `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per
>>> se. The Tweedie distributions, for some parameters at least, may not fit
>>> into the `scipy.stats` infrastructure at all. We have no infrastructure for
>>> continuous-with-point-mass distributions. `rv_generic` is still built under
>>> the assumption that it's going to be implementing *either* a continuous
>>> *or* a discrete distribution.
>>>
>>> I recommend implementing the functionality that you need outside of
>>> scipy following whatever API solves your problems best. Then we can
>>> evaluate if there is infrastructure that can be built that would help the
>>> second continuous-with-point-mass distribution that we might want next.
>>>
>>
>> a long long time ago, I started a ParametricMixture model for this
>>
>> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py
>>
>> (until I gave up on distributions)
>>
>> An alternative as temporary solution would be to add some
>> methods/functions like logpdf to scipy.stats, so statsmodels and sklearn
>> can reuse those.
>>
>> Josef
>>
>>
>>
>>>
>>> --
>>> Robert Kern
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>
>> _______________________________________________
>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200313/7ab59f38/attachment.html>


More information about the SciPy-Dev mailing list