[Tutor] Algorithm for combination analysis suggestion.

Kent Johnson kent37 at tds.net
Thu Feb 11 21:21:51 CET 2010


On Thu, Feb 11, 2010 at 1:59 PM, Matthew Matson <gtxy37 at hotmail.com> wrote:
>
> Hi Tutors,
>
> I am looking for the proper approach regarding the analysis of a dictionary
> of combinations I have.
>
> What I need to do is read from a supplied text file that has a unique ID and
> that unique ID's associated combination of elements. So let's say I have the
> following lines in a text file (real file could be millions of lines):
>
> "ID"    "Elements"
> 1    'A, B, C, D'
> 2    'A, D'
> 3    'D, E'
> 4    'A, D'
> 5    'A, B'
> 6    'A, C, D'
>
> and I do something like...
>
> combinationDict = {}
> for line in file:
>     data = line.split('\t')
>     comb = tuple(data[1].split(','))
>     if comb not in combinationDict:
>         combinationDict[comb] = 1
>     else:
>         combination[comb] +=1

Use
  combinationDict = collections.defaultdict(int)

Then you can just write
  combination[comb] += 1
without the test for comb in combinattionDict.

> Now after I read all of the data I end up with a dictionary with the
> combination as the key and the associated total qty as its value.
>
> print combinationDict
> {('A','B','C','D'):1, ('A','D'):2, ('D','E'):1, ('A','B'):1, ('A', 'C',
> 'D'):1}
>
> What I am looking for is a starting point for a solution in python to
> analyze the combination list so that I can determine for example that ('A',
> 'D') is the most popular combination and then determining how many other
> combinations in the dictionary contain this combination.

maxComb = max(combinationDict, key=combinationDict.__getitem__) will
give you the single key with the largest count.

maxCombSet = set(maxComb)
[ comb for comb in combinationDict if maxCombSet <= set(comb) ]

will give you a list of all the combinations that contain the max
though it could be slow if you have lots of unique combinations
(because of all the set conversions).

> I would like to incorporate some parameters so for example the combination
> ('A','B','C','D') and ('A', 'C', 'D') contain ('A','D') so they are valid
> but I could also say that as long as one element is contained in a
> combination it is valid as well provided I add no more than one additional
> item to the combination.

Now you are starting to lose me but you can modify the conditional in
the above list comprehension to make whatever kind of test you want.

> If I apply this logic then ('D','E') can ('A','B')
> can contain ('A', 'D') and if I apply this to the combination dictionary I
> have:
>
> {('B','C', ('A', 'D')):1, ('A','D'):2, ('E', ('A', 'D')):1, ('B', ('A',
> 'D')):1, ('C', ('A', 'D')):1}
>
> which I could then query the keys for ('A', 'D') inclusion to get a total of
> 4 for ('A', 'D').

Now you have lost me completely. What are the keys in this new dict?
How do you get a total of 4 fro ('A', 'D')?

Kent

> I hope this isn't too long and confusing but I am looking for an approach
> where I can analyze for the highest quantity of combinations and then
> iterate through the dictionary substituting those combinations that were
> determined a "highest qty" combination into other low qty combinations when
> valid.
>
> I was hoping to have parameters to qualify a high qty combination (e.g.
> every combination with qty above 10,000) with the highest quantity of that
> determined set taking precedence for substitution for the first pass then
> moving on to the next highest combination for the second pass of
> substitution etc.. The other parameter would be for the combination that
> would receive a substitution whereby I might say that I can only substitute
> if a substitution results in only one additional (superfluous) value being
> added to the combination existing low qty combination.
>
> I have looked around and this sounds like it might be similar to a packing
> problem and in particular the knapsack problem but I can't seem to wrap my
> head around an approach for this in python. I am not looking for a solution
> just some guidance on a starting point or perhaps libraries that may be
> helpful.
>
> Thank you.
>
>
> ________________________________
> Windows® phone-your Windows stuff, on the go. See more.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>


More information about the Tutor mailing list