[SciPy-Dev] (no subject)

Clint Valentine valentine.c at husky.neu.edu
Mon Dec 22 19:47:04 EST 2014


I ported an algorithm originally written in MatLab to Python for generating
a random contingency table (when one exists) for a statistical test I am
writing. I know something like this doesn't exist in
scipy.stats.contingency and was wondering if it would be a good addition? I
haven't developed any tests for the function yet but can if needed.

I have just started developing in Python as an undergraduate student
studying computational biology. This would be my first time contributing to
open source and I am looking forward to the the experience and all the
advice you are willing to offer. Thanks for your time.

-Clint

def rcont(ncol, nrow, ncolt, nrowt):
    """
    Return a random contingency table based on the two-dimensional shape
    and marginal totals of a given contingency table

    There must be 1 or many tables with nonnegative, integral entries
    that have the given shape and marginal sums as input. The function
    will generate, at random, one of the tables and return it. Repeated
    calls to the function will return new random selections.

    Parameters:
    -----------
    ncol : integer
        The vector length of the columns in the contingency table

    nrow : integer
        The vector length of the rows in the contingency table

    ncolt : array_like
        The vector of the marginal column totals in the contingency table

    nrowt : array_like
        The vector of the marginal row totals in the contingency table

    Returns
    -------
    imatrix : ndarray of integers
        The randomly generated two-dimensional ndarray preserving the
        same shape and marginal totals provided if exists

    Reference
    ---------
    Algorithm AS144:  Appl. Statist. (1979) v.28, pp.329-332

    Licensing:
        GNU LGPL license

    Author:
        Original FORTRAN77 version by James Boyett.
        MATLAB version by John Burkardt.
        Python version by Clint Valentine

    Reference:
        James Boyett,
        Algorithm AS 144:
        Random R x C Tables with Given Row and Column Totals,
        Applied Statistics,
        Volume 28, Number 3, pages 329-332, 1979.
    """
    # Initialize and permute vector
    nvect = np.array(xrange(sum(ncolt)), dtype=np.int_)
    np.random.shuffle(nvect)

    # Construct random matrix and partial column sums
    imatrix = np.zeros([nrow, ncol], dtype=np.int_)
    nsubt = np.cumsum(ncolt)

    # Generate random RxC contingency tables preserving shape and
    # marginal column and row sums and total sum.
    count = 0
    for i in xrange(nrow):
        limit = nrowt[i]
        for _ in xrange(limit):
            for j in xrange(ncol):
                if nvect[count] <= nsubt[j]:
                    count += 1
                    imatrix[i][j] += 1
                    break
    return imatrix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20141222/f6eafe0a/attachment.html>


More information about the SciPy-Dev mailing list