[Python-Dev] Tuning Python dicts

Reid Kleckner rnk at mit.edu
Sat Apr 10 22:32:25 CEST 2010


On Sat, Apr 10, 2010 at 2:57 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Any other advice would also be helpful.
>
> I may be missing the point, but ISTM that the assumption of this
> approach is that there are often collisions in the hash table. I think
> that assumption is false; at least, I recommend to validate that
> assumption before proceeding.

It's just an experiment for a class, not something I am (yet)
seriously thinking about contributing back to CPython.  I think my
chances of improving over the current implementation are slim.  I do
not have that much hubris.  :)  I would just rather do experimental
rather than theoretical work with data structures.

I think you're right about the number of collisions, though.  CPython
dicts use a pretty low load factor (2/3) to keep collision counts
down.  One of the major benefits cited in the paper is the ability to
maintain performance in the face of higher load factors, so I may be
able to bump up the load factor to save memory.  This would increase
collisions, but then that wouldn't matter, because resolving them
would only require looking within two consecutive cache lines.

> There are, of course, cases where dicts do show collisions (namely when
> all keys hash equal), however, I'm uncertain whether the approach would
> help in that case.

Yes, in fact, hopscotch hashing fails completely as I mentioned at the
end of my last message.

Reid


More information about the Python-Dev mailing list