[SciPy-user] Combinatoric/Statistics: length of longest sequence

Sun Oct 29 08:32:23 EST 2006

Dear All,

I have a 2d-array A of p-values (coming from pairwise statistical tests
of N datasets). N is typically in the range of 100-1000. Now, given a
certain threshold, I want to find the longest subset S of range(N), so
that A[i][j]>threshold for all i,j from S.

Unfortunately, the array A is not 'sparse', i.e. a lot of pairs (i,j)
will fulfill A[i][j]>threshold (typically ~50% of the elements).

All I can think of are brute force algorithms that probably will run
forever because of the large number of possible combinations to check.

It would be great if someone could give a pointer to a smart algorithm.

If it is not possible to give the exact answer, what algorithm could be
used to get a good approximative answer?

Regards
Hans Georg