[SciPy-Dev] Tweedie distributions in scipy.stats

rlucas7 at vt.edu rlucas7 at vt.edu
Sun Mar 15 19:00:22 EDT 2020


> On Mar 13, 2020, at 12:46 PM, Robert Kern <robert.kern at gmail.com> wrote:
> 
> 
>> On Fri, Mar 13, 2020 at 12:15 PM <josef.pktd at gmail.com> wrote:
> 
>> 
>> 
>>> On Fri, Mar 13, 2020 at 12:04 PM <josef.pktd at gmail.com> wrote:
>>> Aside:
>>> compound Poisson is a convolution of distributions and not a finite mixture.
>>> Allowing an infinite mixing distribution like Poisson creates numerical problems in the upper tail that are not easy to solve in general.
>>> In most cases, computation have to be truncated at the upper tail, but then the problem is to figure out the truncation threshold for a required precision.
>>> My guess is that this would be a lot of work to get it to scipy standards.
>>> 
>>> I was looking at the general case for convolution and compound poisson a long time ago, mainly using fft to get the pdf and cdf from the characteristic function,, the cf is relatively simple to compute for convolutions. The references in extreme value and risk applications that I looked at, was emphasizing tail precision and ways how to work around it, or comparing different methods in how precise they are.
>>> fft was fast, but I only eyeballed the truncation threshold for my examples.
>> 
I think I also looked at that at my previous employer, I think the reference I had used is this one
https://eprints.usq.edu.au/3888/1/Dunn_Smyth_Stats_and_Comp_v18n1.pdf
Hopefully that helps.

>> I thought of representing tweedie for the computation as a mixture between a mass point/discrete distribution and a distribution for the continuous part, so we can handle the two parts separately.

I came to the same conclusion after thinking about this a bit over the last few days. 

>> Following this, it might be possible to add a zero-truncated tweedie distribution as a continuous distribution subclass in scipy. 
>> Then we could just add a simple mixture of the mass point at zero and the zero-truncated tweedie.

The difference in zero inflated poisson is that it can be handled directly within the rv_discrete framework (I think).
An rv_continuous with a zero point mass would handle a tweedie with 1<p<2 and any other point mass mixture I thought of that weren’t artificial. E.g. a normal and 0 point mass mixture (used in stochastic search variable selection) and a 0 point mass mixture with a moment normal used in Valen Johnson’s work. 
These seem to be the common applications (0 point mass).
> 
> That could certainly work. It seems like handling that smoothly may be a pain for the user; you'd have to coordinate the effect of the parameters on both the size of the point mass and the continuous part, as well as the mixture.
> 
> My recommendation is to implement this in its own package, using whatever frameworks you find help you solve your data analysis problems. Then we can figure out where it ought to finally live and how to extend the existing frameworks to handle this case best.

Thanks for suggesting this Robert, this is a wise strategy, this will enable to work out something that would generalize outside of the specifics of only the tweedie distribution.

> The code doesn't have to start out in scipy.stats in order to make use of the scipy.stats framework. Please do continue to put the necessary special functions into scipy.special; that framework is a little harder to use outside of scipy.special. If you need my vote of support for that on that PR, you have it.

I found this from another statsModels developer that may be helpful to use as reference

https://github.com/thequackdaddy/tweedie/blob/master/tweedie/tweedie_dist.py
 
Hopefully you find it helpful. 

> 
> -- 
> Robert Kern
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200315/01ace204/attachment.html>


More information about the SciPy-Dev mailing list