[Pythonmac-SIG] Fast way to get a list of unique values?

Robb Brown brownr@ucalgary.ca
Tue, 10 Sep 2002 11:29:49 -0600


Thanks for all the replies!

Much as I'm tempted to use your first method.... :)

The array is a volume mask for medical data.  The problem is that 
there may be several masks in the same file (several different 
segmented tissues).  If I have a list of the unique values, I know 
how many tissues are present and can divide them into separate 
objects.

Because of the size of these volumes I suspect it's time to break out 
the C compiler.




>Quoting Robb Brown <brownr@ucalgary.ca>:
>>
>>  I have a large array (Numeric) of integers.  I'd like to get a list
>>  of all the unique values in that array.  Naturally I'd like to avoid
>>  a for loop.
>
>Here's one way.  It does avoid using a for loop.
>I'm guessing that it isn't quite exactly what you wanted.
>Fun exercise, though.   See below for a more serious suggestion.
>(This is *Python*, after all...)
>
>#!/usr/bin/env python
>#
>#  Robb has a "large array (Numeric) of integers.  I'd like to get
>#   a list of all the unique values in that array.  Naturally I'd like
>#   to avoid a for loop.  Is there an easy way to do this in Python?
>#
>#  No mention of speed ...
>#
>
>from operator import __mul__
>
>#  --- from the cookbook, w/ slight mods --
># Sieve of Eratosthenes
># David Eppstein, UC Irvine, 28 Feb 2002
># From the ASPN Python cookbook ...
>#  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117119
>#
>from __future__ import generators
>
>def eratosthenes ():
>     '''Yields the sequence of prime numbers via the Sieve of Eratosthenes.'''
>     D = {}  # map composite integers to primes witnessing their compositeness
>     q = 2   # first integer to test for primality
>     while 1:
>	if q not in D:
>	    yield q        # not marked composite, must be prime
>             D[q*q] = [q]   # first multiple of q not already marked
>         else:
>             map( lambda p: D.setdefault(p+q,[]).append(p), D[q] )
>             del D[q]       # no longer need D[q], free memory
>         q += 1
>
>primes_iter = eratosthenes()
>primes_list = [ ]
>prim_d = { }
>
>def primes ( n ):
>     while  len(primes_list) <= n:
>         nx = primes_iter.next()
>         primes_list.append( nx )
>         prim_d[ nx ] = len(primes_list)-1
>     return primes_list[n]
>    
>def uniq ( arr ):
>     # map the array into a goedel number of sorts
>     gm = min( arr )
>     gx = max( arr )
>
>     # offset the list to n in [ 0 .. max-min+1 ]
>     # and then find the nth prime
>     bias = map( lambda x: primes(x-gm), arr )
>
>     # blast all those primes together
>     goedel = reduce( __mul__, bias )
>
>     # find the ones that factor into goedel
>     fax = filter( lambda n: goedel%n == 0, primes_list )
>
>     # print len(arr), arr
>     # print bias
>     # print goedel
>     # print len(fax), fax
>     return map( lambda z: prim_d[z] + gm, fax )
>
>if __name__ == "__main__":
>     def test_e ():
>         iter = eratosthenes()
>         lst = []
>         while 1:
>             ff = iter.next()
>             lst.append( ff )
>             print ff
>             if ff > 1000:
>                 break
>         print lst[ -12: ]
>            
>     def test_p ():
>         print primes( 12 )
>
>     def test ():
>         arr = [ -9, -2, -4, 0, 19, 2, 42, -9, -2, -4, 0, 19, 2, 421, 32, 42 ]
>         print uniq( arr )
>
>     test_e()
>     test_p()
>
>     test()
>
>
>>   Is there an easy way to do this in Python?
>
>You do not tell us where you acquired that "large array".
>My suggestion, given the lack of any other detail, would be to see if
>it's possible to create that array as a dict instead (with the numbers
>as keys, as Bob mentioned).   I.e., do the unique-ifying while building
>the array, instead of treating it as a separate problem.
>
>
>
>
>-=-=-=-
>
>food. shelter. clothing. net.         Got.net - The Internet Connection, Inc


-- 
______________________________
Robb Brown
Seaman Family MR Research Centre
Calgary, Alberta, Canada