Tough sorting problem: or, I'm confusing myself

Fri Apr 16 09:14:46 EDT 2010

david jensen wrote:

> Hi all,
> 
> I'm trying to find a good way of doing the following:
> 
> Each n-tuple in combinations( range( 2 ** m ), n ) has a corresponding
> value n-tuple (call them "scores" for clarity later). I'm currently
> storing them in a dictionary, by doing:
> 
> ----
> res={}
> for i in itertools.combinations( range( 2**m ) , n):
>     res[ i ] = getValues( i )    # getValues() is computationally
> expensive
> ----
> 
> For each (n-1)-tuple, I need to find the two numbers that have the
> highest scores versus them. I know this isn't crystal clear, but
> hopefully an example will help: with m=n=3:
> 
> Looking at only the (1, 3) case, assuming:
> getValues( (1, 2, 3) ) == ( -200, 125, 75 )    # this contains the
> highest "other" score, where 2 scores 125
> getValues( (1, 3, 4) ) == ( 50, -50, 0 )
> getValues( (1, 3, 5) ) == ( 25, 300, -325 )
> getValues( (1, 3, 6) ) == ( -100, 0, 100 )    # this contains the
> second-highest, where 6 scores 100
> getValues( (1, 3, 7) ) == ( 80, -90, 10  )
> getValues( (1, 3, 8) ) == ( 10, -5, -5 )
> 
> I'd like to return ( (2, 125), (6, 100) ).
> 
> The most obvious (to me) way to do this would be not to generate the
> res dictionary at the beginning, but just to go through each
> combinations( range( 2**m), n-1) and try every possibility... this
> will test each combination n times, however, and generating those
> values is expensive. [e.g. (1,2,3)'s scores will be generated when
> finding the best possibilities for (1,2), (1,3) and (2,3)]
> 
> What I'm doing now is ugly, and i think is where i'm confusing myself:
> 
> ----
> best2={}
> for i in itertools.combinations( range( 2**m), n-1):
>     scorelist=[]
>     for j in range( 2**m ):
>         if j not in i:
>             k=list(i)
>             k.append(j)
>             k=tuple(sorted(k))    #gets the key for looking up the
> scores in res
>             scorelist.append((j,res[k][k.index(j)]))
>     best2[i]=sorted(scorelist,key=lambda x: -x[1])[:2]
> ----
> 
> Am I missing an obviously better way?

After some tinkering I came up with

def calculate(getvalues):
    best2 = defaultdict(list)
    for t in combinations(range(2**m), n):
        values = getvalues(t)
        for i, k in enumerate(t):
            best2[t[:i] + t[i+1:]].append((k, values[i]))
    return dict((k, nlargest(2, v, key=itemgetter(1))) for k, v in 
best2.iteritems())

which is concise but slower than your code with Raymond's improvements. I 
then tried inlining the nlargest() operation:

def calculate_inlined(getvalues):
    best2 = {}
    for t in combinations(range(2**m), n):
        values = getvalues(t)
        for i, k in enumerate(t):
            key = t[:i] + t[i+1:]
            value = values[i]
            if key not in best2:
                best2[key] = [(k, value), (None, None)]
            else:
                a, b = best2[key]
                if value > a[1]:
                    best2[key] = [(k, value), a]
                elif value > b[1]:
                    best2[key] = [a, (k, value)]
    return best2

This gives a speed boost (I measured with m=n=5) but is both messy and 
brittle. I've got a hunch that there is a better way; I just don't see it at 
the moment...

Peter