[Python-Dev] Tuning Python dicts

Reid Kleckner rnk at mit.edu
Sat Apr 10 23:35:24 CEST 2010


On Sat, Apr 10, 2010 at 4:40 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Reid Kleckner <rnk <at> mit.edu> writes:
>>
>> I think you're right about the number of collisions, though.  CPython
>> dicts use a pretty low load factor (2/3) to keep collision counts
>> down.  One of the major benefits cited in the paper is the ability to
>> maintain performance in the face of higher load factors, so I may be
>> able to bump up the load factor to save memory.  This would increase
>> collisions, but then that wouldn't matter, because resolving them
>> would only require looking within two consecutive cache lines.
>
> Why wouldn't it matter? Hash collisions still involve more CPU work, even though
> if you're not access memory a lot.

So we know for sure that extra CPU work to avoid cache misses is a big
win, but it's never clear whether or not any given memory access will
be a miss.  Because Python's access patterns are rarely random, it may
turn out that all of the elements it accesses are in cache and the
extra CPU work dominates.  If you have a random access pattern over a
large dataset, the dictionary will not fit in cache, and saving random
memory accesses at the expense of CPU will be a win.

Reid


More information about the Python-Dev mailing list