[Python-Dev] Reducing memory overhead for dictionaries by removing me_hash

Sun Apr 23 22:47:23 CEST 2006

[Kirat Singh]
> Hi, this is my first python dev post, so please forgive me if this topic has
> already been discussed.

It's hard to find one that hasn't -- but it's even harder to find the
old discussions ;-)

> It seemed to me that removing me_hash from a dict entry would save 2/3 of
> the space

1/3, right?

> used by dictionaries and also improve alignment of the entries
> since they'd be 8 bytes instead of 12.

How would that help?  On 32-bit boxes, we have 3 4-byte members in
PyDictEntry, and they'll all 4-byte aligned.  In what respect related
to alignment is that sub-optimal?

> And sets end up having just 4 byte entries.
>
> I'm guessing that string dicts are the most common (hence the specialized
> lookupdict_string routine),

String dicts were the only kind at first, and their speed is critical
because Python itself makes heavy use of them (e.g., to implement
instance and module namespaces, and keyword arguments).

> and since strings already contain their hash, this would probably mitigate
> the performance impact.

No slowdown in string dicts would be welcome, but since strings
already cache their own hash, they seem unaffected by this.

It's the speed of other key types that would suffer, and for classes
that define their own __hash__ method that could well be deadly (see
Martin's reply for more detail).

> One could also add a hash to Tuples since they are immutable.

A patch to do that was recently rejected.  You can read its comments
for some of the reasons:

    http://www.python.org/sf/1462796

More reasons were given in a python-dev thread about the same thing
earlier this month:

    http://mail.python.org/pipermail/python-dev/2006-April/063275.html

> If this isn't a totally stupid idea, I'd be happy to volunteer to try the
> experiment and run any suggested tests.

I'd be -1 if it slowed dict operations for classes that define their
own __hash__.  I do a lot of that ;-)

> PS any opinion on making _Py_StringEq a macro?

Yes:  don't bother unless it provably speeds something "important" :-)
 It's kinda messy for a macro otherwise, macros always make debugging
harder (can't step through the source expansion in a debugger w/o a
lot of pain), etc.

> inline function would be nice but I hesitate to bring up the C/C++ debate, both
> languages suck in their own special way ;-)

Does the Python source even compile as C++ now?  People have been
working toward that, but my last impression was that it's not there
yet.