[Python-Dev] shrinking dicts
Vladimir Marangozov
Vladimir.Marangozov@inrialpes.fr
Wed, 11 Aug 1999 15:33:17 +0100 (NFT)
Tim Peters wrote:
>
> [Vladimir]
> > Currently, dictionaries always grow until they are deallocated from
> > memory.
>
> It's more accurate to say they never shrink <0.9 wink>. Even that has
> exceptions, though, starting with:
>
> > This happens in PyDict_SetItem according to the following
> > code (before inserting the new item into the dict):
> >
> > /* if fill >= 2/3 size, double in size */
> > if (mp->ma_fill*3 >= mp->ma_size*2) {
> > if (dictresize(mp, mp->ma_used*2) != 0) {
> > if (mp->ma_fill+1 > mp->ma_size)
> > return -1;
> > }
> > }
>
> This code can shrink the dict too. The load factor computation is based on
> "fill", but the resize is based on "used". If you grow a huge dict, then
> delete all the entries one by one, "used" falls to 0 but "fill" stays at its
> high-water mark. At least 1/3rd of the entries are NULL, so "fill"
> continues to climb as keys are added again: when the load factor
> computation triggers again, "used" may be as small as 1, and dictresize can
> shrink the dict dramatically.
Thanks for clarifying this!
> [snip]
>
> > ...
> > Any other ideas on how to deal with this? Thoughts, comments?
>
> Just that slowing the expected case to prevent theoretical bad cases is
> usually a net loss -- I think the onus is on you to demonstrate that this
> change is an exception to that rule.
I won't, because this case is rare in practice, classifying it already
as an exception. A real exception. I'll have to think a bit more about
all this. Adding 1/3 new entries to trigger the next resize sounds
suboptimal (if it happens at all).
> I do recall one real-life complaint
> about it on c.l.py a couple years ago: the poster had a huge dict,
> eventually deleted most of the items, and then kept it around purely for
> lookups. They were happy enough to copy the dict into a fresh one a
> key+value pair at a time; today they could just do
>
> d = d.copy()
>
> or even
>
> d.update({})
>
> to shrink the dict.
>
> It would certainly be good to document these tricks!
I think that officializing these tricks in the documentation is a bad idea.
>
> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
> see-why-1999-is-special-ly y'rs - tim
>
This is a good (your favorite ;-) argument, but don't forget that you've
been around, teaching people various tricks.
And 1999 is special -- we just had a solar eclipse today, the next being
scheduled for 2081.
--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252