Optimisation Hints (dict processing and strings)

Wed Mar 30 04:34:22 EST 2005

OPQ wrote:
>>- Try if it isn't faster to iterate using items instead of iterating 
>>over keys
> 
> 
> items are huge lists of numbers. keys are simple small strings. And
> even if it is faster, how can I find the key back, in order to delete
> it ?
> for v in hashh.items():
>     if len(v)<2:
>            del ???????
> 

To elaborate on the memory requirements of .keys () vs. items ():

.keys () creates a new list of n objects. The objects are additional 
references to the existing keys.

.items () creates also a new list of n objects. These objects are tuples 
  of references, one to the key and one to the value. Only references 
are used so it doesn't  matter how large the value actually is. Whether 
the tuples are created for the items () call or already exist depends on 
the implementation of the dictionary. Trying to infer this by using 
sys.getrefcount got me inconclusive results.

> I gonna try, but think that would be overkill: a whole list has to be
> computed !
> Maybe whith genexps ...... for key in (k for (k,v) in hash.iteritems()
> if len(v)<2)

Using only iterators has problems:

for k,v in hash.iteritems ():
     if len (v) < 2:
         del hash [k]

You are changing hash while you iterate over it, this very often breaks 
the iterator.

If you are memory bound, maybe a dabase like SQLite is really the way to 
go. Or you could write the keys to a remporary file in the loop and then 
write a second loop that reads the keys and deletes them from hash.

Daniel