[Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

Kirat Singh kirat.singh at gmail.com
Sun Apr 23 23:03:37 CEST 2006


Interesting, thanks for the responses. And yeah I meant 1/3, I always mix up
negatives.

Agree that as you point out the biggest slowdown will be on classes that
define their own __hash__, however since classes use instancedicts and this
would reduce the dict size from 96 -> 64 bytes, we could blow 4 bytes to
cache the hash on the object.

In fact PyObject_Hash could 'intern' the result of __hash__ into a
__hashvalue__ member of the class dict. This might be the best of both
worlds since it'll only use space for the hashvalue if its needed.

Oh and the reason I brought up strings was that one can grab the ob_shash
from the stringobject in lookupdict_string to avoid even the function call
to get the hash for a string, so its just the same as storing the hash on
the entry for strings.

The reason I looked into this to begin with was that my code used up a bunch
of memory which was traceable to lots of little objects with instance dicts,
so it seemed that if instancedicts took less memory I wouldn't have to go
and add __slots__ to a bunch of my classes, or rewrite things as
tuples/lists, etc.

thanks!
-Kirat

On 4/23/06, Tim Peters <tim.peters at gmail.com> wrote:
>
> [Kirat Singh]
> > Hi, this is my first python dev post, so please forgive me if this topic
> has
> > already been discussed.
>
> It's hard to find one that hasn't -- but it's even harder to find the
> old discussions ;-)
>
> > It seemed to me that removing me_hash from a dict entry would save 2/3
> of
> > the space
>
> 1/3, right?
>
> > used by dictionaries and also improve alignment of the entries
> > since they'd be 8 bytes instead of 12.
>
> How would that help?  On 32-bit boxes, we have 3 4-byte members in
> PyDictEntry, and they'll all 4-byte aligned.  In what respect related
> to alignment is that sub-optimal?
>
> > And sets end up having just 4 byte entries.
> >
> > I'm guessing that string dicts are the most common (hence the
> specialized
> > lookupdict_string routine),
>
> String dicts were the only kind at first, and their speed is critical
> because Python itself makes heavy use of them (e.g., to implement
> instance and module namespaces, and keyword arguments).
>
> > and since strings already contain their hash, this would probably
> mitigate
> > the performance impact.
>
> No slowdown in string dicts would be welcome, but since strings
> already cache their own hash, they seem unaffected by this.
>
> It's the speed of other key types that would suffer, and for classes
> that define their own __hash__ method that could well be deadly (see
> Martin's reply for more detail).
>
> > One could also add a hash to Tuples since they are immutable.
>
> A patch to do that was recently rejected.  You can read its comments
> for some of the reasons:
>
>     http://www.python.org/sf/1462796
>
> More reasons were given in a python-dev thread about the same thing
> earlier this month:
>
>     http://mail.python.org/pipermail/python-dev/2006-April/063275.html
>
> > If this isn't a totally stupid idea, I'd be happy to volunteer to try
> the
> > experiment and run any suggested tests.
>
> I'd be -1 if it slowed dict operations for classes that define their
> own __hash__.  I do a lot of that ;-)
>
> > PS any opinion on making _Py_StringEq a macro?
>
> Yes:  don't bother unless it provably speeds something "important" :-)
> It's kinda messy for a macro otherwise, macros always make debugging
> harder (can't step through the source expansion in a debugger w/o a
> lot of pain), etc.
>
> > inline function would be nice but I hesitate to bring up the C/C++
> debate, both
> > languages suck in their own special way ;-)
>
> Does the Python source even compile as C++ now?  People have been
> working toward that, but my last impression was that it's not there
> yet.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060423/518a4e11/attachment.htm 


More information about the Python-Dev mailing list