[SciPy-User] the skellam distribution

Mon Nov 9 11:58:04 EST 2009

On Mon, Nov 9, 2009 at 11:19 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 11/09/2009 10:01 AM, Johann Cohen-Tanugi wrote:
>>    From what I understand of the initial statement from Ernest:
>> "
>> In case somebody is interested, or you want to include it
>> in scipy. I used these specs here from the R package:
>> cran.r-project.org/web/packages/skellam/skellam.pdf
>> "
>>
>> he used the spec, as defined in this pdf, and did not look at the code
>> itself. If my interpretation of the small preamble above is correct, I
>> believe his implementation is not GPL-tainted, right?
>>
>> Johann
>> Bruce Southey wrote:
>>
>>> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote:
>>>
>>>
>>>> 2009/11/8 Ernest Adrogué<eadrogue at gmx.net>:
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> In case somebody is interested, or you want to include it
>>>>> in scipy. I used these specs here from the R package:
>>>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>>>
>>>>> Note that I am no statician, somebody who knows what he's
>>>>> doing (as opposed to me ;) should verify it's correct.
>>>>>
>>>>>
>>>>> import numpy
>>>>> import scipy.stats.distributions
>>>>>
>>>>> # Skellam distribution
>>>>>
>>>>> ncx2 = scipy.stats.distributions.ncx2
>>>>>
>>>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>>>>      def _pmf(self, x, mu1, mu2):
>>>>>          if x<   0:
>>>>>              px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>>>>          else:
>>>>>              px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>>>>          return px
>>>>>      def _cdf(self, x, mu1, mu2):
>>>>>          x = numpy.floor(x)
>>>>>          if x<   0:
>>>>>              px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>>>>          else:
>>>>>              px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>>>>          return px
>>>>>      def _stats(self, mu1, mu2):
>>>>>          mean = mu1 - mu2
>>>>>          var = mu1 + mu2
>>>>>          g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>>>>          g2 = 1 / (mu1 + mu2)
>>>>>          return mean, var, g1, g2
>>>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>>>>                        shapes="mu1,mu2", extradoc="")
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Thanks, I think the distribution of the difference of two poisson
>>>> distributed random variables could be useful.
>>>>
>>>> Would you please open an enhancement ticket for this at
>>>> http://projects.scipy.org/scipy/report/1
>>>>
>>>> I had only a brief look at it so far, I had never looked at the
>>>> Skellam distribution before, and just read a few references.
>>>>
>>>> The "if x<   0 .. else ..." will have to be replace with a
>>>> "numpy.where" assignment, since the methods are supposed to work with
>>>> arrays of x (as far as I remember)
>>>>
>>>> _rvs could be implemented directly instead of generically (I don't
>>>> find the reference, where I saw it, right now).
>>>>
>>>> Documentation will be necessary,  a brief description in the
>>>> (currently) extradocs, and a listing of the properties for the
>>>> description of the distributions currently in the stats tutorial.
>>>>
>>>> I have some background questions, which address the limitation of the
>>>> implementation (but are not really necessary for inclusion into
>>>> scipy).
>>>>
>>>> The description in R mentions several implementation of Skellam. Do
>>>> you have a rough idea what the range of parameters are for which the
>>>> implementation using ncx produces good results? Do you know if any
>>>> other special functions would produce good results over a larger
>>>> range, e.g. using Bessel function?
>>>>
>>>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>>>> mentions (but doesn't describe) the case of Skellam distribution with
>>>> correlated Poisson distributions. Do you know what the difference to
>>>> your implementation would be?

same form different interpretation

"The distributions of the dierence between two independent and two
bivariate (correlated)
Poisson variates are of the same form. However, the interpretation of
the parameters is different.
Assuming that the bivariate Poisson distribution is the correct
distribution, then the
marginal means x and y will be unbiased estimates of 1 + 3 and 2
+ 3, respectively,
instead of the parameters of interest 1 and 2. Therefore, the
parameters of the PD distribution
are not directly connected to the marginal means of the actual Poisson
distributions."

from
Bayesian analysis of the dierences of count data
D. Karlis and I. Ntzoufras
STATISTICS IN MEDICINE
Statist. Med. 2006; 25:1885–1905

they have some funny application to soccer scores
http://stat-athens.aueb.gr/~jbn/publications.htm

Josef
Published online 26 October 2005 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/sim.2382

>>>>
>>>> Tests for a new distribution will be picked up by the generic tests,
>>>> but it would be useful to have some extra tests for extreme/uncommon
>>>> parameter ranges. Do you have any comparisons with R, since you
>>>> already looked it?
>>>>
>>>>
>>>> Thanks again, I'm always looking out for new useful distributions,
>>>> (but I have to find the time to do the testing and actual
>>>> implementation).
>>>>
>>>> Josef
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Bye.
>>>>>
>>>>> --
>>>>> Ernest
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>>
>>> Generally any R code can not be used in numpy because R is GPL. Usually
>>> R code is also licensed under GPL  so translation from R to Python/numpy
>>> still maintains the original license. So the code not used by numpy
>>> unless that code is licensed under a BSD compatible license.
>>>
>>> You *must* show that you implementation is from a BSD-compatible source
>>> not from the R package. I can see that your code is very simple so there
>>> should be an viable alternative source.
>>>
>>> Also, in the _stats function why do you do not re-use the mean and var
>>> variables in computing the g1 and g2 variables?
>>>
>>> What are 'x, mu1, mu2' ?
>>> This looks like a scalar implementation so you need to either check that
>>> or allow for array-like inputs.
>>>
>>> Bruce
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> I am not a lawyer! But I do not see that any reference to not seeing the
> code. Furthermore, there is insufficient information in the cited
> reference for this implementation (but I have not seen the actual code
> and would rather not have to see it). But, as Josef pointed out, there
> is a Wikipedia source so it should be trivial to show that this code is
> independent of the R implementation.
>
> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>