[SciPy-user] Combinatoric/Statistics: length of longest sequence

A. M. Archibald peridot.faceted at gmail.com
Sun Oct 29 18:14:00 EST 2006


On 29/10/06, Dr. Hans Georg Krauthaeuser <hgk at et.uni-magdeburg.de> wrote:
> Dear All,
>
> I have a 2d-array A of p-values (coming from pairwise statistical tests
> of N datasets). N is typically in the range of 100-1000. Now, given a
> certain threshold, I want to find the longest subset S of range(N), so
> that A[i][j]>threshold for all i,j from S.

It's not quite clear, here, whether you want the longest *contiguous*
subset ({2,3,4,5,6}) or the largest arbitrary subset ({2,17,21,22}).

> Unfortunately, the array A is not 'sparse', i.e. a lot of pairs (i,j)
> will fulfill A[i][j]>threshold (typically ~50% of the elements).
>
> All I can think of are brute force algorithms that probably will run
> forever because of the large number of possible combinations to check.

If you want the longest contiguous subset, for N=1000 that's only of
order a million operations for a brute-force algorithm (for each
starting element, see how long the run is), so unless you want to do
it lots of times I wouldn't worry. If you do, some Numeric cleverness
might allow you to move some of the heavy lifting to C, though for
simple indexing calculations pure python can be plenty fast.

If you want the largest arbitrary subset, that's going to be
expensive, and it bears an unnerving resemblance to the subset-sum
problem. I can imagine an expensive  recursive algorithm that would do
it without too much pain, though the space to search is so huge you're
liable to be taking a very long time. However you go about it, you may
be better off using python's set datatype (for element i, form the set
of all elements j for which A[i][j]>threshold).

I've attached a demonstration program that solves both interpretations
of your problem; the difficult one appears to bog down seriously
around N=200 (with p=0.5) but the easy one runs into memory problems
before taking too long.

A. M. Archibald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: settrick.py
Type: text/x-python
Size: 1055 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20061029/94aa24c8/attachment.py>


More information about the SciPy-User mailing list