selecting dictionaries to maximize counts

Fri Feb 18 15:13:21 EST 2005

I have a list of dictionaries.  Each dictionary holds counts of various 
'words', e.g.:

py> countdicts = [
...     dict(a=9, b=9, c=9),
...     dict(a=8, b=7),
...     dict(a=4, b=5, c=12)]

I need to select dicts with the constraint that the number of each 
'word' totalled over all selected dicts doesn't exceed a given 
MAX_VALUE.  Right now, I do this by:

py> def select(dicts, n):
...     result = []
...     counts = {}
...     def doadd(d):
...         for label, count in d.iteritems():
...             if counts.get(label, 0) + count > n:
...                 return False
...         return True
...     for d in dicts:
...         if doadd(d):
...             result.append(d)
...             for label, count in d.iteritems():
...                 counts[label] = counts.get(label, 0) + count
...     return result
...

Basically, I use a greedy approach -- adding a dict each time if I can. 
  This leads to some suboptimal solutions given that, while the total 
counts must not exceed MAX_VALUE, I'd like them to be as close to 
MAX_VALUE as possible.  An example of data on which my select function 
produces such a suboptimal solution:

py> countdicts = [
...     dict(a=9, b=9, c=9),
...     dict(a=8, b=7),
...     dict(a=4, b=5, c=12)]
py> select(countdicts, 12)
[{'a': 9, 'c': 9, 'b': 9}]

The better solution would have been to select the second two dicts -- 
then all 'words' would have count 12.

I don't need an optimal solution; in fact, the solution I have right now 
is tolerable.  But if anyone knows a simple way to improve my 
performance, I'd appreciate it.  Thanks!

STeVe

P.S.  No, I'm not just making these problems up!  I'm sick, but not that 
sick. ;)  It has to do with resampling a set of data...  If you're 
really interested, I can provide the full details, but they'll only make 
the problem even _more_ complicated. =)