[SciPy-User] scipy.stats.fit inquiry

Tue Oct 20 14:41:36 EDT 2009

2009/10/20  <josef.pktd at gmail.com>:
> On Tue, Oct 20, 2009 at 11:13 AM, Anne Archibald
> <peridot.faceted at gmail.com> wrote:
>>
>> Incidentally, I have some code implementing the Kuiper test, a
>> modified K-S test that is sensitive to different aspects of the shape
>> of the distribution, and (more importantly for me) is invariant on
>> shifting a distribution or sample modulo 1. I haven't submitted it for
>> inclusion because the interface I used is a little different from that
>> used by scipy's K-S test, but if there's interest I'd be happy to
>> contribute it.
>
> More tests in scipy.stats is always good (at least as long as I don't
> have to chase down the references to write the tests for the tests.)
> How do you get the pvalue or critical values for Kuiper, since the
> distribution of the test statistic is not a standard distribution?

There's a special function, like the one for the K-S test (but
obviously different in details), which I implemented. I have tests for
the Kuiper test too. Most of the code is at home and in any case needs
some polishing, but I've attached kuiper.py just so you can see what's
there. (There's also some extra stuff for dealing with different
exposure times for different phases, which doesn't need to go in.)

Some references are Paltani 2004, "Searching for periods in X-ray
observations using Kuiper's test. Application to the ROSAT PSPC
archive", and Kuiper 1962, "Tests concerning random points on a
circle" (which I haven't read).

I also have some code for the H test (essentially looking at the
Fourier coefficients to find how many harmonics are worth including
and what the significance is; de Jager et al. 1989 "A poweful test for
weak periodic signals with unknown light curve shape in sparse data").
But circular statistics seem to be a bit obscure, so I'm not sure how
much effort should go into putting this in scipy.

> (From what I understand from Sherpa, is that Cash is used as
> objective function for the maximum likelihood estimation of a
> Poisson process.)

In principle it's more general, but it's used, for example, in the
X-ray spectral fitting program XSPEC for when you only have a handful
of photons in each energy bin.

> Looking for Kuiper, I found a nice overview of a large list of goodness
> of fit tests, used in natural sciences.
>
> http://www.ge.infn.it/statisticaltoolkit/gof/deployment/userdoc/statistics/applicability.html
>
> with article in
> http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=29603&arnumber=1344284&count=103&index=22

Hmm, I'll have to look into the Watson statistic, I haven't run into it before.

> And apropos circular statistic, which I know nothing about:
>
> stats.distribution.vonmises is the only distribution that has bounded
> (or circular) support but doesn't define the bounds (a, b). This
> screws up numerical integration, e.g. for the moment calculation.
>
> Is vonmises actually used on circular support?
> To enable integration, we need to define a and b (e.g [-pi,pi] as
> in numpy random) or introduce new bounds specifically for
> integration in this case.

I use circular statistics quite a bit, since it's the appropriate
toolkit for working with X-ray pulsar pulse profiles. In particular
the von Mises distribution is handy for producing simulated pulse
profiles (there's also a sense in which it is the circular analog of
the normal distribution; it's maximum entropy for a fixed first moment
IIRC). Unfortunately the generic stats interface is quite clumsy for
this particular case. If I recall correctly as of 0.6.0 the interface
was kind of miserable to use, since the CDF had a discontinuity at -pi
and pi, and using loc moved the discontinuity along with the function.
(I think there was also a severe bug for large values of the shape
parameter.) I'm not sure what the interface is like in more recent
versions, since 0.6.0 is what's on the computers here at work, but I
think it's better.

It's worth defining the boundaries, but I don't think you'll get
useful moment calculations out of it, since circular moments are
defined differently from linear moments: rather than int(x**n*pdf(x))
they're int(exp(2,j*pi*n*x)*pdf(x)). There are explicit formulas for
these for the von Mises distribution, though, and they're useful
because they are essentially the Fourier coefficients of the pdf.

For context, I've done some work with representing circular
probability distributions (= pulse profiles) in terms of their moments
(= Fourier series), and in particular using a kernel density estimator
with either a sinc kernel (= truncation) or a von Mises kernel (=
scaling coefficients by von Mises moments). The purpose was to get
not-too-biased estimates of the modulation amplitudes of X-ray
pulsations; the paper has been kind of sidetracked by other work.

Anne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kuiper.py
Type: text/x-python
Size: 6018 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091020/d9ce0577/attachment.py>