[issue28201] dict: perturb shift should be done when first conflict

INADA Naoki report at bugs.python.org
Mon Sep 19 00:25:13 EDT 2016


New submission from INADA Naoki:

Current perturb shift code is like following:

    for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
        i = mask & ((i << 2) + i + perturb + 1);

This loop is start after first conflict. It means perturb == hash for first conflict.

The purpose of perturb shift is avoid long conflict chain when keys more
than two have hashes their lower-bit is same. So first perturb should be hash >> PERTURB_SHIFT.

example: Consider about ma_keys == 16 and keys are [1, 17, 33, 49, 65].
Current perturb
1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 17 + 1) == 5; use ix==5 slot.
3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 33 + 1) == 5; ix==5 conflicts; ...

When first perturb = hash >> PERTURB_SHIFT:
1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (17>>5) + 1) == 4; use ix==4 slot.
3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (33>>5) + 1) == 5; use ix==5 slot.


While it's difficult to see performance difference from benchmark, this should be decrease possibility of 2nd conflict.

----------
components: Interpreter Core
messages: 276936
nosy: methane
priority: normal
severity: normal
status: open
title: dict: perturb shift should be done when first conflict
versions: Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28201>
_______________________________________


More information about the Python-bugs-list mailing list