Optimisation Hints (dict processing and strings)
Daniel Dittmar
daniel.dittmar at sap.corp
Wed Mar 30 04:34:22 EST 2005
OPQ wrote:
>>- Try if it isn't faster to iterate using items instead of iterating
>>over keys
>
>
> items are huge lists of numbers. keys are simple small strings. And
> even if it is faster, how can I find the key back, in order to delete
> it ?
> for v in hashh.items():
> if len(v)<2:
> del ???????
>
To elaborate on the memory requirements of .keys () vs. items ():
.keys () creates a new list of n objects. The objects are additional
references to the existing keys.
.items () creates also a new list of n objects. These objects are tuples
of references, one to the key and one to the value. Only references
are used so it doesn't matter how large the value actually is. Whether
the tuples are created for the items () call or already exist depends on
the implementation of the dictionary. Trying to infer this by using
sys.getrefcount got me inconclusive results.
> I gonna try, but think that would be overkill: a whole list has to be
> computed !
> Maybe whith genexps ...... for key in (k for (k,v) in hash.iteritems()
> if len(v)<2)
Using only iterators has problems:
for k,v in hash.iteritems ():
if len (v) < 2:
del hash [k]
You are changing hash while you iterate over it, this very often breaks
the iterator.
If you are memory bound, maybe a dabase like SQLite is really the way to
go. Or you could write the keys to a remporary file in the loop and then
write a second loop that reads the keys and deletes them from hash.
Daniel
More information about the Python-list
mailing list