[SciPy-User] the skellam distribution
Ernest Adrogué
eadrogue at gmx.net
Mon Nov 9 13:16:51 EST 2009
9/11/09 @ 09:40 (-0500), thus spake josef.pktd at gmail.com:
> Thanks, I think the distribution of the difference of two poisson
> distributed random variables could be useful.
>
> Would you please open an enhancement ticket for this at
> http://projects.scipy.org/scipy/report/1
Okay, I will.
> I had only a brief look at it so far, I had never looked at the
> Skellam distribution before, and just read a few references.
>
> The "if x < 0 .. else ..." will have to be replace with a
> "numpy.where" assignment, since the methods are supposed to work with
> arrays of x (as far as I remember)
Done.
> _rvs could be implemented directly instead of generically (I don't
> find the reference, where I saw it, right now).
I suppose it could be done, but I can't figure out how.
As far as I understand, you can't derive the poisson parameters
from the skellam parameters alone, you need the correlation
coefficient (between the two poisson rv) too, is that right?
> Documentation will be necessary, a brief description in the
> (currently) extradocs, and a listing of the properties for the
> description of the distributions currently in the stats tutorial.
>
> I have some background questions, which address the limitation of the
> implementation (but are not really necessary for inclusion into
> scipy).
>
> The description in R mentions several implementation of Skellam. Do
> you have a rough idea what the range of parameters are for which the
> implementation using ncx produces good results? Do you know if any
> other special functions would produce good results over a larger
> range, e.g. using Bessel function?
No, sorry, I have no idea. The R paper says that in R the Bessel
function is more accurate, but I don't think this means that the
Bessel function implementation is more accurate in general.
I compared results using the Bessel function in scipy and using
the ncx2 implementation, and the differences were minimal, although
I admit my testing wasn't extensive. Another problem is that
I haven't got a table of values for this particular distribution,
so in case results differ I can't tell which one is more accurate.
> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
> mentions (but doesn't describe) the case of Skellam distribution with
> correlated Poisson distributions. Do you know what the difference to
> your implementation would be?
As far as I know, it makes no difference if the two Poisson
variates are correlated or not. The Skellam parameters are
defined as
mu1 = lam1 - rho * sqrt(lam1*lam2)
mu2 = lam2 - rho * sqrt(lam1*lam2)
where lam1 and lam2 are the Poisson means and rho is the
correlation coefficient. So, in case there is correlation it
is implicit in the parameters mu1 and mu2, and it doesn't
make any difference in terms of calculating the pmf or cdf.
Again, I am no statician or mathematician.
> Tests for a new distribution will be picked up by the generic tests,
> but it would be useful to have some extra tests for extreme/uncommon
> parameter ranges. Do you have any comparisons with R, since you
> already looked it?
I have this visual test, although it's usefulness is
questionable. It shows the deviation between observed and
expected frequencies for a Skellam random variable. Ideally
errors should be random and centered around 0. To me
it looks like errors tend to increase around the mean
and are smaller along the tails, but I don't know if this
means something or not.
import numpy
import scipy.stats.distributions
poisson = scipy.stats.distributions.poisson
ncx2 = scipy.stats.distributions.ncx2
# Skellam distribution
class skellam_gen(scipy.stats.distributions.rv_discrete):
def _pmf(self, x, mu1, mu2):
px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2,
ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2)
return px
def _cdf(self, x, mu1, mu2):
x = numpy.floor(x)
px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1),
1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2))
return px
def _stats(self, mu1, mu2):
mean = mu1 - mu2
var = mu1 + mu2
g1 = (mean) / numpy.sqrt((var)**3)
g2 = 1 / var
return mean, var, g1, g2
skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
shapes="mu1,mu2", extradoc="")
if __name__ == '__main__':
lam1 = 3.4
lam2 = 6.1
n = 5000
poisson_var1 = numpy.random.poisson(lam1, n)
poisson_var2 = numpy.random.poisson(lam2, n)
skellam_var = poisson_var1-poisson_var2
low = min(skellam_var)
high = max(skellam_var)
obs_freq = numpy.histogram(
skellam_var, numpy.arange(low, high+2))[0] / float(n)
rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0]
mu1 = lam1 - rho * numpy.sqrt(lam1*lam2)
mu2 = lam2 - rho * numpy.sqrt(lam1*lam2)
exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2)
print obs_freq-exp_freq
# plot
import matplotlib.pyplot as plt
plt.figure().add_subplot(1,1,1).plot(obs_freq-exp_freq)
plt.show()
Regards.
--
Ernest
More information about the SciPy-User
mailing list