[SciPy-User] the skellam distribution

Mon Nov 9 11:07:02 EST 2009

On Mon, Nov 9, 2009 at 10:53 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
>> 2009/11/8 Ernest Adrogué<eadrogue at gmx.net>:
>>
>>> Hi,
>>>
>>> In case somebody is interested, or you want to include it
>>> in scipy. I used these specs here from the R package:
>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>
>>> Note that I am no statician, somebody who knows what he's
>>> doing (as opposed to me ;) should verify it's correct.
>>>
>>>
>>> import numpy
>>> import scipy.stats.distributions
>>>
>>> # Skellam distribution
>>>
>>> ncx2 = scipy.stats.distributions.ncx2
>>>
>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>>     def _pmf(self, x, mu1, mu2):
>>>         if x<  0:
>>>             px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>>         else:
>>>             px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>>         return px
>>>     def _cdf(self, x, mu1, mu2):
>>>         x = numpy.floor(x)
>>>         if x<  0:
>>>             px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>>         else:
>>>             px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>>         return px
>>>     def _stats(self, mu1, mu2):
>>>         mean = mu1 - mu2
>>>         var = mu1 + mu2
>>>         g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>>         g2 = 1 / (mu1 + mu2)
>>>         return mean, var, g1, g2
>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>>                       shapes="mu1,mu2", extradoc="")
>>>
>>>
>> Thanks, I think the distribution of the difference of two poisson
>> distributed random variables could be useful.
>>
>> Would you please open an enhancement ticket for this at
>> http://projects.scipy.org/scipy/report/1
>>
>> I had only a brief look at it so far, I had never looked at the
>> Skellam distribution before, and just read a few references.
>>
>> The "if x<  0 .. else ..." will have to be replace with a
>> "numpy.where" assignment, since the methods are supposed to work with
>> arrays of x (as far as I remember)
>>
>> _rvs could be implemented directly instead of generically (I don't
>> find the reference, where I saw it, right now).
>>
>> Documentation will be necessary,  a brief description in the
>> (currently) extradocs, and a listing of the properties for the
>> description of the distributions currently in the stats tutorial.
>>
>> I have some background questions, which address the limitation of the
>> implementation (but are not really necessary for inclusion into
>> scipy).
>>
>> The description in R mentions several implementation of Skellam. Do
>> you have a rough idea what the range of parameters are for which the
>> implementation using ncx produces good results? Do you know if any
>> other special functions would produce good results over a larger
>> range, e.g. using Bessel function?
>>
>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>> mentions (but doesn't describe) the case of Skellam distribution with
>> correlated Poisson distributions. Do you know what the difference to
>> your implementation would be?
>>
>> Tests for a new distribution will be picked up by the generic tests,
>> but it would be useful to have some extra tests for extreme/uncommon
>> parameter ranges. Do you have any comparisons with R, since you
>> already looked it?
>>
>>
>> Thanks again, I'm always looking out for new useful distributions,
>> (but I have to find the time to do the testing and actual
>> implementation).
>>
>> Josef
>>
>>
>>
>>> Bye.
>>>
>>> --
>>> Ernest
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> Generally any R code can not be used in numpy because R is GPL. Usually
> R code is also licensed under GPL  so translation from R to Python/numpy
> still maintains the original license. So the code not used by numpy
> unless that code is licensed under a BSD compatible license.
>
> You *must* show that you implementation is from a BSD-compatible source
> not from the R package. I can see that your code is very simple so there
> should be an viable alternative source.

We only need to read the R documentation and not the R code:

pskellam(x,lambda1,lambda2) returns pchisq(2*lambda2, -2*x, 2*lambda1)
for x <= 0 and 1 - pchisq(2*lambda1, 2*(x+1), 2*lambda2) for x >= 0. When
pchisq incorrectly returns 0, a saddlepoint approximation is
substituted, which typically gives at
least 2-figure accuracy.
The quantile is defined as the smallest value x such that F(x)  p,
where F is the distribution
function. For lower.tail=FALSE, the quantile is defined as the largest
value x such that
F(x;lower.tail=FALSE)  p.
rskellam is calculated as rpois(n,lambda1)-rpois(n,lambda2)
dskellam.

and

The relation of dgamma to the modified Bessel function of the first
kind was given by Skellam
(1946). The relation of pgamma to the noncentral chi-square was given
by Johnson (1959). Tables
are given by Strackee and van der Gon (1962), which can be used to
verify this implementation (cf.
direct calculation in the examples below).

the rest follows from the Wikipedia page (which is also in the list of
references in R docs), there is no copyright on the definition of a
distribution.

>
> Also, in the _stats function why do you do not re-use the mean and var
> variables in computing the g1 and g2 variables?
>
> What are 'x, mu1, mu2' ?

x is the integer at which cdf pr pmf are calculated, mu1,mu2 are the
parameters of the poisson distributions, following wikipedia.

Josef

> This looks like a scalar implementation so you need to either check that
> or allow for array-like inputs.
>
> Bruce
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>