[Python-Dev] shrinking dicts

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Wed, 11 Aug 1999 15:33:17 +0100 (NFT)


Tim Peters wrote:
> 
> [Vladimir]
> > Currently, dictionaries always grow until they are deallocated from
> > memory.
> 
> It's more accurate to say they never shrink <0.9 wink>.  Even that has
> exceptions, though, starting with:
> 
> > This happens in PyDict_SetItem according to the following
> > code (before inserting the new item into the dict):
> >
> >         /* if fill >= 2/3 size, double in size */
> >         if (mp->ma_fill*3 >= mp->ma_size*2) {
> >                 if (dictresize(mp, mp->ma_used*2) != 0) {
> >                         if (mp->ma_fill+1 > mp->ma_size)
> >                                 return -1;
> >                 }
> >         }
> 
> This code can shrink the dict too.  The load factor computation is based on
> "fill", but the resize is based on "used".  If you grow a huge dict, then
> delete all the entries one by one, "used" falls to 0 but "fill" stays at its
> high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
> continues to climb as keys are added again:  when the load factor
> computation triggers again, "used" may be as small as 1, and dictresize can
> shrink the dict dramatically.

Thanks for clarifying this!

> [snip]
> 
> > ...
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> Just that slowing the expected case to prevent theoretical bad cases is
> usually a net loss -- I think the onus is on you to demonstrate that this
> change is an exception to that rule.

I won't, because this case is rare in practice, classifying it already
as an exception. A real exception. I'll have to think a bit more about
all this. Adding 1/3 new entries to trigger the next resize sounds
suboptimal (if it happens at all).

> I do recall one real-life complaint
> about it on c.l.py a couple years ago:  the poster had a huge dict,
> eventually deleted most of the items, and then kept it around purely for
> lookups.  They were happy enough to copy the dict into a fresh one a
> key+value pair at a time; today they could just do
> 
>     d = d.copy()
> 
> or even
> 
>     d.update({})
> 
> to shrink the dict.
> 
> It would certainly be good to document these tricks!

I think that officializing these tricks in the documentation is a bad idea.

> 
> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
>     see-why-1999-is-special-ly y'rs  - tim
> 

This is a good (your favorite ;-) argument, but don't forget that you've
been around, teaching people various tricks.

And 1999 is special -- we just had a solar eclipse today, the next being
scheduled for 2081.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252