[SciPy-dev] Inclusion of Kuiper test in Scipy

Jake VanderPlas jakevdp at gmail.com
Mon Nov 2 10:29:07 EST 2009


Anne,
I also recently required a Kuiper test code for my research.  I
adapted an IDL routine for python.  I'd say it is definitely worth
including.  In addition to what you listed, a routine to calculate the
significance of the Kuiper value would be useful.  I have a python
version of that code if you'd like to see it.
   -Jake

On Mon, Nov 2, 2009 at 6:50 AM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> Hi,
>
> I have implemented a statistical test from the literature, the Kuiper
> test, for my own work, but I think it might be worth including it in
> Scipy itself. I'd like to hear other people's opinions, though, both
> on what (if anything) should go into scipy, and on whether it needs
> modification. The code is at:
>
> http://github.com/aarchiba/kuiper
>
> This code includes a number of things beyond the basic test, some or
> all of which may not be worth including in Scipy. What's there:
>
> The Kuiper test - analogous to the Kolmogorov-Smirnov test, this takes
> either a sample and a callable CDF or two samples and returns an
> abstract score and the probability that a score that large would have
> arisen if the two arguments are from the same distribution. This test
> is sensitive to somewhat different features of the distribution than
> the K-S test, and, importantly, it is invariant under cyclic
> permutation: that is, if all the samples and distribution are modulo
> (say) 1, then any shift in both arguments leaves the value unaffected.
> Thus it is well suited to periodic distributions.
>
> The Z_m^2 test - a test for uniformity on [0,1) based on the first m
> Fourier coefficients. Returns a score and the probability of a score
> that large.
>
> The H test - a test that uses a data-dependent number of harmonics to
> test for uniformity. Returns the score and the probability, and also
> the number of harmonics that gave the most significant detection.
>
> fold_intervals - a function to take a series of weighted intervals and
> return the total exposure of each phase modulo 1. For testing for
> uniformity when you have more data from some phases than others.
> cdf_from_intervals - a function to construct a piecewise-linear CDF
> from a set of exposures (as returned by the above function).
> histogram_intervals - A function to evaluate how much exposure each
> histogram bin received, to allow testing for uniformity using a
> histogram in the presence of non-uniform exposure.
>
> There are also a couple of handy decorators in the test suite:
>
> seed - set the random seed before running a test
> double_check - for randomized tests: run once, and if it fails, run it again.
>
> All have tests and somewhat informative docstrings, but I suspect some
> of them may be too specialized to be of much use. The Kuiper test
> should have wide applicability; the Z_m^2 test and H test, not so
> much, although they are handy when testinf gor periodicity. The last
> batch of utility functions I'm not sure are general enough to be very
> useful, but I needed them.
>
> What do you think? How much of this would be useful in Scipy?
>
> Thanks,
> Anne
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list