[SciPy-Dev] (no subject)

Sturla Molden sturla.molden at gmail.com
Tue Dec 23 06:33:30 EST 2014


As Joseph said, we cannot have LGPL code in SciPy, but we can use the 
original paper as source. The license must be SciPy's version of the BSD 
license, or at least compatible.

Some comments on your code:

The inner loop should be vectorized. If not possible (I have not looked 
at it carefully) you should consider to use Cython or Fortran (f2py).

imatrix[i][j] is a common beginner's mistake. Use imatrix[i,j] (yes it 
matters) or preferably vectorize (cf. comment above).

The function should take a RandomState as an optional argument.

For Monte Carlo, it would be nice to have a repeats argument as well, to 
squash some of the Python overhead.

If you add a repeats argument it is better to use Cython or Fortran 
because numpy.random.shuffle and numpy.random.permutation do not take an 
axis argument, hence you cannot vectorize the suffling. (This is a 
problem in NumPy we should fix.)

All in all, I would say this points to Cython based solution.


Sturla



On 23/12/14 01:47, Clint Valentine wrote:
> I ported an algorithm originally written in MatLab to Python for
> generating a random contingency table (when one exists) for a
> statistical test I am writing. I know something like this doesn't exist
> in scipy.stats.contingency and was wondering if it would be a good
> addition? I haven't developed any tests for the function yet but can if
> needed.
>
> I have just started developing in Python as an undergraduate student
> studying computational biology. This would be my first time contributing
> to open source and I am looking forward to the the experience and all
> the advice you are willing to offer. Thanks for your time.
>
> -Clint
>
> def rcont(ncol, nrow, ncolt, nrowt):
>      """
>      Return a random contingency table based on the two-dimensional shape
>      and marginal totals of a given contingency table
>
>      There must be 1 or many tables with nonnegative, integral entries
>      that have the given shape and marginal sums as input. The function
>      will generate, at random, one of the tables and return it. Repeated
>      calls to the function will return new random selections.
>
>      Parameters:
>      -----------
>      ncol : integer
>          The vector length of the columns in the contingency table
>
>      nrow : integer
>          The vector length of the rows in the contingency table
>
>      ncolt : array_like
>          The vector of the marginal column totals in the contingency table
>
>      nrowt : array_like
>          The vector of the marginal row totals in the contingency table
>
>      Returns
>      -------
>      imatrix : ndarray of integers
>          The randomly generated two-dimensional ndarray preserving the
>          same shape and marginal totals provided if exists
>
>      Reference
>      ---------
>      Algorithm AS144:  Appl. Statist. (1979) v.28, pp.329-332
>
>      Licensing:
>          GNU LGPL license
>
>      Author:
>          Original FORTRAN77 version by James Boyett.
>          MATLAB version by John Burkardt.
>          Python version by Clint Valentine
>
>      Reference:
>          James Boyett,
>          Algorithm AS 144:
>          Random R x C Tables with Given Row and Column Totals,
>          Applied Statistics,
>          Volume 28, Number 3, pages 329-332, 1979.
>      """
>      # Initialize and permute vector
>      nvect = np.array(xrange(sum(ncolt)), dtype=np.int_)
>      np.random.shuffle(nvect)
>
>      # Construct random matrix and partial column sums
>      imatrix = np.zeros([nrow, ncol], dtype=np.int_)
>      nsubt = np.cumsum(ncolt)
>
>      # Generate random RxC contingency tables preserving shape and
>      # marginal column and row sums and total sum.
>      count = 0
>      for i in xrange(nrow):
>          limit = nrowt[i]
>          for _ in xrange(limit):
>              for j in xrange(ncol):
>                  if nvect[count] <= nsubt[j]:
>                      count += 1
>                      imatrix[i][j] += 1
>                      break
>      return imatrix
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>





More information about the SciPy-Dev mailing list