Optimize function similiar to dict.update() but adds common values

Steve Holden steve at holdenweb.com
Wed Dec 14 12:16:20 EST 2005


Gregory Piñero wrote:
[top-posting rearranged]
> On 12/14/05, Peter Otten <__peter__ at web.de> wrote:
> 
>>Gregory Piñero wrote:
>>
>>
>>>def add_freqs(freq1,freq2):
>>>"""Addtwowordfreqdicts"""
>>>newfreq={}
>>>forkey,valueinfreq1.items():
>>>newfreq[key]=value+freq2.get(key,0)
>>>forkey,valueinfreq2.items():
>>>newfreq[key]=value+freq1.get(key,0)
>>>returnnewfreq
>>
>>>Anyideasondoingthistaskalot faster would be appriciated.
>>
>>With items() you copy the whole dictionary into a list of tuples;
>>iteritems() just walks over the existing dictionary and creates one tuple
>>at a time.
>>
>>With "80% overlap", you are looking up and setting four out of five values
>>twice in your for-loops.
>>
>>Dump the symmetry and try one of these:
>>
>>def add_freqs2(freq1, freq2):
>>    total = dict(freq1)
>>    for key, value in freq2.iteritems():
>>        if key in freq1:
>>            total[key] += value
>>        else:
>>            total[key] = value
>>    return total
>>
>>def add_freqs3(freq1, freq2):
>>    total = dict(freq1)
>>    for key, value in freq2.iteritems():
>>        try:
>>            total[key] += value
>>        except KeyError:
>>            total[key] = value
>>    return total
>>
>>My guess is that add_freqs3() will perform best.
>>
 > Thanks Peter, those are some really good ideas.  I can't wait to try
 > them out tonight.
 >
 > Here's a question about your functions.  if I only look at the keys in
 > freq2 then won't I miss any keys that are in freq1 and not in freq2?
 > That's why I have the two loops in my original function.
 >
No, because the statement

   total = dict(freq1) creates total as a shallow copy of freq1. Thus 
all that remains to be done is to add the items from freq2.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/



More information about the Python-list mailing list