Optimize function similiar to dict.update() but adds common values
Steve Holden
steve at holdenweb.com
Wed Dec 14 12:16:20 EST 2005
Gregory Piñero wrote:
[top-posting rearranged]
> On 12/14/05, Peter Otten <__peter__ at web.de> wrote:
>
>>Gregory Piñero wrote:
>>
>>
>>>def add_freqs(freq1,freq2):
>>>"""Addtwowordfreqdicts"""
>>>newfreq={}
>>>forkey,valueinfreq1.items():
>>>newfreq[key]=value+freq2.get(key,0)
>>>forkey,valueinfreq2.items():
>>>newfreq[key]=value+freq1.get(key,0)
>>>returnnewfreq
>>
>>>Anyideasondoingthistaskalot faster would be appriciated.
>>
>>With items() you copy the whole dictionary into a list of tuples;
>>iteritems() just walks over the existing dictionary and creates one tuple
>>at a time.
>>
>>With "80% overlap", you are looking up and setting four out of five values
>>twice in your for-loops.
>>
>>Dump the symmetry and try one of these:
>>
>>def add_freqs2(freq1, freq2):
>> total = dict(freq1)
>> for key, value in freq2.iteritems():
>> if key in freq1:
>> total[key] += value
>> else:
>> total[key] = value
>> return total
>>
>>def add_freqs3(freq1, freq2):
>> total = dict(freq1)
>> for key, value in freq2.iteritems():
>> try:
>> total[key] += value
>> except KeyError:
>> total[key] = value
>> return total
>>
>>My guess is that add_freqs3() will perform best.
>>
> Thanks Peter, those are some really good ideas. I can't wait to try
> them out tonight.
>
> Here's a question about your functions. if I only look at the keys in
> freq2 then won't I miss any keys that are in freq1 and not in freq2?
> That's why I have the two loops in my original function.
>
No, because the statement
total = dict(freq1) creates total as a shallow copy of freq1. Thus
all that remains to be done is to add the items from freq2.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
More information about the Python-list
mailing list