iterating through humongously large key lists!!

Tim Peters tim.one at home.com
Wed Apr 25 21:16:50 EDT 2001


[Mark blobby Robinson]
> I needed a non-destructive method so popitem was not a viable option
> ...

[Alex Martelli]
> In many cases it should not be a problem to 'restore' into _another_
> dictionary.  Or do you have many references to the starting dict
> which it would be a problem to update...?  Then you can do the
> same trick again -- sure, it's gonna be slower, but...
>
> The general idea would be:
>
> newdic = {}
> while len(oldict):
>     key, value = oldict.popitem()
>     process(key, value)
>     newdic[key] = value
>
> and now in newdic you have all you earlier had in oldict, and may
> pour it back if need be.  Of course, not thread-safe, not fastest, &c,
> but still it may be useful.

Well, if he was concerned about the memory burden of materializing the full
list of keys before, *this* one is gonna kill him:  the memory consumed by
oldict does not decrease during this loop, so in the end you have two dicts
of the original size, and dicts consume about 4-5x the memory of a list with
the same # of entries.

BTW, I'm describing an accident of the implementation here, not an inherent
feature of dicts, so this may change:  under all implementations of Python
dicts to date, a dict never resizes except possibly in response to
__setitem__.  You can *delete* things from a dict until the sun goes nova (or
Win95 crashes, whichever comes first <wink>), but the dict won't shrink
internally before you starting adding keys to it again.  This is partly
intentional, (a) to avoid pointless thrashing for ping-pong dicts, and (b) to
minimize the number of cycles burned checking *whether* to resize.

A curious fact under 2.0 and earlier is that a dict can resize even when
merely replacing the value for a key that already exists.  We stopped that in
2.1, though.

pragmatically y'rs  - tim





More information about the Python-list mailing list