Optimisation Hints (dict processing and strings)

Wed Mar 30 07:18:51 EST 2005

OPQ wrote:
>>>for (2):
>>>for k in hash.keys()[:]: # Note : Their may be a lot of keys here
>>>   if len(hash[k])<2:
>>>      del hash[k]
>>
> 
>>- use the dict.iter* methods to prevent building a list in memory. You 
>>shouldn't use these values directly to delete the entry as this could 
>>break the iterator:
>>
>>for key in [k for (k, v) in hash.iteritems () if len (v) < 2]:
>>     del hash (key)
>>
> 
> 
> I gonna try, but think that would be overkill: a whole list has to be
> computed !

Yes, but it is smaller than the list returned by hash.keys(), so it should be a win over what you 
were doing originally. Plus it avoids a lookup (hash[k]) which may improve the speed also.

BTW I have long assumed that iterating key, value pairs of a dict using iteritems() is faster than 
iterating with keys() followed by a lookup, since the former method should be able to avoid actually 
hashing the key and looking it up.

I finally wrote a test, and my assumption seems to be correct; using iteritems() is about 1/3 faster 
for simple keys.

Here is a simple test:

##

d = dict((i, i) for i in range(10000))

def withItems(d):
     for k,v in d.iteritems():
         pass

def withKeys(d):
     for k in d:
         d[k]

from timeit import Timer

for fn in [withItems, withKeys]:
     name = fn.__name__
     timer = Timer('%s(d)' % name, 'from __main__ import d, %s' % name)
     print name, timer.timeit(1000)

##

I get
withItems 0.980311184801
withKeys 1.37672944466

Kent