selecting dictionaries to maximize counts
Steven Bethard
steven.bethard at gmail.com
Fri Feb 18 15:13:21 EST 2005
I have a list of dictionaries. Each dictionary holds counts of various
'words', e.g.:
py> countdicts = [
... dict(a=9, b=9, c=9),
... dict(a=8, b=7),
... dict(a=4, b=5, c=12)]
I need to select dicts with the constraint that the number of each
'word' totalled over all selected dicts doesn't exceed a given
MAX_VALUE. Right now, I do this by:
py> def select(dicts, n):
... result = []
... counts = {}
... def doadd(d):
... for label, count in d.iteritems():
... if counts.get(label, 0) + count > n:
... return False
... return True
... for d in dicts:
... if doadd(d):
... result.append(d)
... for label, count in d.iteritems():
... counts[label] = counts.get(label, 0) + count
... return result
...
Basically, I use a greedy approach -- adding a dict each time if I can.
This leads to some suboptimal solutions given that, while the total
counts must not exceed MAX_VALUE, I'd like them to be as close to
MAX_VALUE as possible. An example of data on which my select function
produces such a suboptimal solution:
py> countdicts = [
... dict(a=9, b=9, c=9),
... dict(a=8, b=7),
... dict(a=4, b=5, c=12)]
py> select(countdicts, 12)
[{'a': 9, 'c': 9, 'b': 9}]
The better solution would have been to select the second two dicts --
then all 'words' would have count 12.
I don't need an optimal solution; in fact, the solution I have right now
is tolerable. But if anyone knows a simple way to improve my
performance, I'd appreciate it. Thanks!
STeVe
P.S. No, I'm not just making these problems up! I'm sick, but not that
sick. ;) It has to do with resampling a set of data... If you're
really interested, I can provide the full details, but they'll only make
the problem even _more_ complicated. =)
More information about the Python-list
mailing list