[SciPy-user] statistical tests notes

Fri Mar 27 14:35:20 EDT 2009

 There was a discussion in ticket 901 on some of the statistical tests
in scipy.stats, and I thought, I post some notes that I keep to have
an overview on the status of it. This doesn't cover all of stats (e.g.
not descriptive statistics)

Josef

Inferential Statistics
======================

tests for location:
-------------------

t-tests and similar
    ttest_1samp
    ttest_ind
    ttest_rel
    f_oneway (F-test)
    glm

    Notes
    -----
    the t-tests, ttest_1samp, ttest_ind and ttest_rel, have been rewritten
       and are well tested
    glm has very incomplete description, just a t-test, needs rewrite
    f_oneway: verified with Nist test set for balanced anova, correct
but looses numerical precision at
       medium to high difficulty, I have a rewrite with higher
numerical precision

rank based tests (some are equivalent)
    mannwhitneyu
    ranksums
    wilcoxon
    kruskal
    friedmanchisquare

    Notes
    -----
    For 2 random variables and no ties mannwhitneyu, ranksums and kruskal
        are equivalent (i.e. return the same p-values but based on different
        statistics)
    kruskal has correct tie handling and works for more than two
        random variables
    friedmanchisquare has been verified, (corrected tie handling)
    mannwhitneyu: corrected, verified
    ranksums: no tie handling

    look at Monte Carlo p-values again, initial trying out didn't show
improvement

tests for scale:
----------------
    ansari
    bartlett
    levene
    fligner
    mood

    Notes
    -----
    I didn't verify any of them by comparing to R or matlab
    Brief checking with Monte Carlo shows that they work (reject wrong
Null, accept correct Null)

tests for distribution:
-----------------------

general
    chisquare
    kstest
    ks_2samp
    anderson

    Notes
    -----
    kstest, ks_2samp were rewritten and verified
    anderson may be fishy, but didn't look very carefully
    chisquare: I use a copy of it in test of discrete distributions
and seems to work well

for normal distribution
    skewtest
    kurtosistest
    normaltest
    shapiro

    Notes
    -----
    not verified but look ok in brief Monte Carlo tests and use in examples

other
    binom_test

    Notes
    -----
    no idea

Anova Ftests
------------

    f_oneway
(for the following: no statistics from data calculated, no p-values returned)
    f_value_wilks_lambda
    f_value
    f_value_multivariate

    Notes
    -----
    f_oneway see above
    others no idea

Correlation measures including pvalues
--------------------------------------

    pearsonr
    spearmanr
    pointbiserialr
    kendalltau

    Notes
    -----
    pearsonr is just standard corrcoef, can be rewritten (mostly
delegated to numpy.corrcoef)
    spearmanr needs rewriting, no tiehandling yet, can be reduced to
        corrcoef on rankdata
    pointbiserialr can be reduced to np.corrcoef, dropped?
    kendalltau is verified, p-value (variance) does not correct for ties
        extension in cython attached to ticket (but no p-values)

Distributions - diagnostics and graphical analysis
==================================================

box-cox transformation only checked whether they run

plots look ok, converted to matplotlib

pdfapprox is broken, I have enhanced rewrite, no good tests yet