[Python-Dev] shrinking dicts

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Tue, 10 Aug 1999 13:04:27 +0100 (NFT)


Currently, dictionaries always grow until they are deallocated from
memory. This happens in PyDict_SetItem according to the following
code (before inserting the new item into the dict):

        /* if fill >= 2/3 size, double in size */
        if (mp->ma_fill*3 >= mp->ma_size*2) {
                if (dictresize(mp, mp->ma_used*2) != 0) {
                        if (mp->ma_fill+1 > mp->ma_size)
                                return -1;
                }
        }

The symmetric case is missing and this has intrigued me for a long time,
but I've never had the courage to look deeply into this portion of code
and try to propose a solution. Which is: reduce the size of the dict by
half when the nb of used items <= 1/6 the size.

This situation occurs far less frequently than dict growing, but anyways,
it seems useful for the degenerate cases where a dict has a peek usage,
then most of the items are deleted. This is usually the case for global
dicts holding dynamic object collections, etc.

A bonus effect of shrinking big dicts with deleted items is that
the lookup speed may be improved, because of the cleaned <dummy> entries
and the reduced overall size (resulting in a better hit ratio).

The (only) solution I could came with for this pb is the appended patch.
It is not immediately obvious, but in practice, it seems to work fine.
(inserting a print statement after the condition, showing the dict size
 and current usage helps in monitoring what's going on).

Any other ideas on how to deal with this? Thoughts, comments?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** dictobject.c-1.5.2	Fri Aug  6 18:51:02 1999
--- dictobject.c	Tue Aug 10 12:21:15 1999
***************
*** 417,423 ****
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key); 
  	return 0;
  }
  
--- 417,430 ----
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key);
! 	/* For bigger dictionaries, if used <= 1/6 size, half the size */
! 	if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) {
! 		if (dictresize(mp, mp->ma_used*2) != 0) {
! 			if (mp->ma_fill > mp->ma_size)
! 				return -1;
! 		}	  
! 	}
  	return 0;
  }