testing for uniquness in a large list

Cliff Wells clifford.wells at comcast.net
Wed Oct 20 07:07:20 EDT 2004


On Wed, 2004-10-20 at 11:01 +0100, Lol McBride wrote:
> Hi All,
> I'm looking for some help in developing a way to test for the uniqueness
> of an item in a large list.To illustrate what I mean the code below is an
> indicator of what I want to do but I'm hoping for a faster way than the
> one shown.Basically,I have a list of 20 integers and I want to create
> another list of 200000 unique subsets of 12 integers from this list.WhatI
> have done here is to use the sample()function from the random module
> and then compare the result to every item in the ints list to check for
> uniqueness - as you can guess this takes an interminable amount of time to
> grind through.Can someone please enlighten me as to how I can do this and
> keep the amount of time to do it to a minimum?
> 
> Thanks for taking the time to read this and doubly so if you're able to
> help.
> Cheers,
> Lol McBride
> 
> #!/usr/bin/python
> from random import *
> #seq is the pool to choose from
> seq = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
> # this is how many times to choose a unique selection
> rlen = 200000
> counter = 100
> ints = [0]
> while 1:
>     # if this is the last value for rlen then we stop
>     if rlen == 0:
> 	break
>     intLen = len(ints)
>     cnt = 0
>     testVal = sample(seq,12)
>     testVal.sort()
>     if len(ints)>=counter:
> 	print len(ints)
> 	counter = counter+100
>     while 1:
> 	if ints.count(testVal)==1:
>             cnt = cnt+1
> 	    break
> 	intLen = intLen -1
> 	if intLen == 0:
> 	    break
> 	if cnt == 1:
> 		continue
> 	else:
> 	    ints.append(testVal)
> 	rlen = rlen -1
> 

Disregarding more efficient methods of generating your list (of which
I'm sure there are many), part of the "interminable time" is probably
taken up in the infinite loop you have ;)  Set rlen to a much smaller
number and you'll see that the program still never terminates.

You'll notice that:

ints = [0]
while 1:
    ...
    intLen = len(ints) # aka 1
    ...
    while 1:
        ...
        intLen = intLen - 1 # aka 0
        if intLen == 0: # always true
            break
        ...



Basically this means that nothing is ever appended to ints nor is rlen
ever decremented (and these are the tests by which the loops terminate).

I hear that Python 3000 will complete infinite loops in half the time
however ;)

Regards,
Cliff

-- 
Cliff Wells <clifford.wells at comcast.net>




More information about the Python-list mailing list