[Python-ideas] Reducing collisions in small dicts/sets
Tim Peters
tim.peters at gmail.com
Mon Jun 26 15:09:35 EDT 2017
[MRAB <python at mrabarnett.plus.com>]
> If the current scheme suffers only for small tables, couldn't you use an
> alternative scheme only for small tables?
Sure. But whether that's desirable partly depends on timing actual C
code. Try it ;-) For maintenance sanity, it's obviously better to
have only one scheme to wrestle with.
Note that "may visit a slot more than once" isn't the only thing in
play, just one of the seemingly important things. For example, the
current scheme requires 3 adds, 2 shifts, and a mask on each loop
iteration (or 1 add and 1 shift of those could be replaced by 1
multiplication). The double-hashing schemes only require 1 add and 1
mask per iteration.
In cases of collision, that difference is probably swamped by waiting
for cache misses. But, as I said in the first msg:
"""
This is a brain dump for someone who's willing to endure the
interminable pain of arguing about
benchmarks ;-)
"""
More information about the Python-ideas
mailing list