[BangPypers] Reorder Dictionary Size in python

Harish Vishwanath harish.shastry at gmail.com
Thu Apr 18 10:42:29 CEST 2013


As you may have noticed, dictionary size depends greatly on the order of
insertion. Let us say you are trying to copy the current items from your
existing dict [which could have been re-ordered multiple times because of
collisions and increase in allocated memory]. It is definitely possible
that the order in which you iterate your existing items may not be the most
optimal order, and could generate more collisions and items could end up in
much deeper slots than their original hash values. This will definitely
hamper your lookup time.

But it will probably reduce the memory consumption, since there are
definitely lesser items in the new dictionary.

Regards,
Harish


On Thu, Apr 18, 2013 at 2:00 PM, Rahul R <rahul8590 at gmail.com> wrote:

> Harish , you nailed it!  Thanks for the video.  So, decreasing the size
> will not be possible afterall.
> But what confuses me is why is copying the items to new dictionary will not
> consume less space ? By copying you will get rid of all the dummy variables
> and lookup time will also decrease.
>
> Thanks a lot!
> ./Rahul
>
> On Thu, Apr 18, 2013 at 1:14 PM, Harish Vishwanath <
> harish.shastry at gmail.com
> > wrote:
>
> > Here is a good talk on how dictionaries are implemented in python :
> >
> >
> http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-the-mighty-dictionary-55-3352147
> >
> > Due to possibility of collisions in the hash table, dict keeps allocating
> > 4x or 2x the size of existing allocation and *will* reorder the items
> when
> > such an operation happens. They go and acquire more memory when the dict
> is
> > 2/3rds full. But the reverse is not possible. A dummy key has to be
> > inserted in a free slot, so that valid items in the dictionary can be
> > found. Since all this depends on the order of insertion, you cannot
> > guarantee that copying existing items into a new dictionary will consume
> > less space.
> >
> > If look up times is not a concern, and you are trying to optimize for
> space
> > the stackoverflow link that Anand shared earlier could help.
> >
> > Regards,
> > Harish
> >
> >
> > On Thu, Apr 18, 2013 at 12:06 PM, Hrishikesh Kulkarni
> > <rishi at turtleyogi.com>wrote:
> >
> > > On Wed, Apr 17, 2013 at 9:30 PM, Rahul R <rahul8590 at gmail.com> wrote:
> > >
> > > > As far as i know, python performs a lazy deletion of values , when we
> > > > delete content from a dictionary (correct me if i am wrong) .  So,
> when
> > > we
> > > > insert a lot of values the dictionary automatically expands. I don't
> > see
> > > > dict shrinking when we delete values from dictionary. In such case,
> is
> > > > there a way to forcibly reduce the dictionary size ?
> > > >
> > >
> > >
> > > Are you measuring this shrinking?
> > > the following  in dictobject.c should guide you with the internals. The
> > > deleted item is simply replaced with a refcounted dummy. The deleted
> item
> > > is marked for cleanup by gc with decref.
> > >
> > > PyDict_DelItem () :
> > > ..
> > >     old_key = ep->me_key;
> > >     Py_INCREF(dummy);
> > >     ep->me_key = dummy;
> > >     old_value = ep->me_value;
> > >     ep->me_value = NULL;
> > >     mp->ma_used--;
> > >     Py_DECREF(old_value);
> > >     Py_DECREF(old_key);
> > > ..
> > >
> > >
> > > dict_length():
> > > ..
> > >     return mp->ma_used;
> > > ..
> > >
> > > PyDict_Size():
> > > ..
> > >     return ((PyDictObject *)mp)->ma_used;
> > > ..
> > >
> > >
> > >
> > > regards,
> > > Rishi
> > >
> > >
> > >
> > >
> > > >
> > > > ./Rahul
> > > >
> > > > On Wed, Apr 17, 2013 at 9:23 PM, Ramdas S <ramdaz at gmail.com> wrote:
> > > >
> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Rahul R <rahul8590 at gmail.com>
> > wrote:
> > > > >
> > > > > > Ahh , sorry If i wasnt clear the first time. I dint mean reorder
> > the
> > > > data
> > > > > > in dictionary. I meant resize the dictionary.
> > > > > >
> > > > >
> > > > > What do you mean by resize?
> > > > >
> > > > >
> > > > > >
> > > > > > ./Rahul
> > > > > >
> > > > > > On Wed, Apr 17, 2013 at 9:05 PM, Anand Chitipothu <
> > > > anandology at gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > dictionaries are unordered. It is not a good idea to expect any
> > > order
> > > > > in
> > > > > > > dictionaries, even if you are seeing some order by chance.
> > > > > > >
> > > > > > > If you need order, then use OrderedDict from collections module
> > > (new
> > > > in
> > > > > > > Python 2.7).
> > > > > > >
> > > > > > > Anand
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Apr 17, 2013 at 9:00 PM, Rahul R <rahul8590 at gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Hey  Guys,
> > > > > > > >
> > > > > > > > Is it possible to forcibly reorder the python dictionary
> after
> > > "n"
> > > > > > number
> > > > > > > > of inserts and deletions. As far as i know, python dictionary
> > > > > performs
> > > > > > > lazy
> > > > > > > > deletes. Thus , even if the data is deleted, python has a
> dummy
> > > > data
> > > > > > > their
> > > > > > > > in order to preserve consistency. The python dictionary
> > > > > > > > keeps expanding when the size of dict is increasing, but
> after
> > > > > > deleting a
> > > > > > > > few parameters the size does not decrease. Is there a way ,
> > > where I
> > > > > can
> > > > > > > > forcibly resize the dictionary ?
> > > > > > > >
> > > > > > > > I was thinking of copying content from existing dictionary to
> > new
> > > > > dict
> > > > > > > and
> > > > > > > > deleting the previous one.But thats a cumbersome operation.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > ./Rahul
> > > > > > > > _______________________________________________
> > > > > > > > BangPypers mailing list
> > > > > > > > BangPypers at python.org
> > > > > > > > http://mail.python.org/mailman/listinfo/bangpypers
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Anand
> > > > > > > http://anandology.com/
> > > > > > > _______________________________________________
> > > > > > > BangPypers mailing list
> > > > > > > BangPypers at python.org
> > > > > > > http://mail.python.org/mailman/listinfo/bangpypers
> > > > > > >
> > > > > > _______________________________________________
> > > > > > BangPypers mailing list
> > > > > > BangPypers at python.org
> > > > > > http://mail.python.org/mailman/listinfo/bangpypers
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ramdas S
> > > > > +91 9342 583 065
> > > > > My Personal Blog on http://ramdaz.wordpress.com
> > > > > _______________________________________________
> > > > > BangPypers mailing list
> > > > > BangPypers at python.org
> > > > > http://mail.python.org/mailman/listinfo/bangpypers
> > > > >
> > > > _______________________________________________
> > > > BangPypers mailing list
> > > > BangPypers at python.org
> > > > http://mail.python.org/mailman/listinfo/bangpypers
> > > >
> > > _______________________________________________
> > > BangPypers mailing list
> > > BangPypers at python.org
> > > http://mail.python.org/mailman/listinfo/bangpypers
> > >
> > _______________________________________________
> > BangPypers mailing list
> > BangPypers at python.org
> > http://mail.python.org/mailman/listinfo/bangpypers
> >
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>


More information about the BangPypers mailing list