[Python-Dev] Tunable parameters in dictobject.c (was dictnotes.txt out of date?)

Mon Jun 18 23:11:42 CEST 2012

On Mon, 18 Jun 2012 13:46:25 -0700
Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
> 
> On Jun 18, 2012, at 12:35 PM, Antoine Pitrou wrote:
> 
> > You are right. I was thinking 50 nanoseconds (which for a - relatively
> > high-end - 3GHz CPU puts us at 150 cycles).
> 
> The last guidance I read from Intel said that
> a cache miss was roughly as expensive as a floating-point divide.

Floating-point divides are not performance-critical in most Python code,
so the comparison is not very useful. Besides, this statement does not
state which kind of cache would be involved in the cache miss.

On recent Intel CPUs a floating-point divide takes between 10
and 24 cycles (*), which could be in line with a L3 cache access (i.e. a
L2 cache miss), but certainly not a main memory access (150+ cycles).

(*) according to http://www.agner.org/optimize/instruction_tables.pdf

(also, modern CPUs have automatic prefetchers which try to hide the
latency of accessing main memory, but AFAIK this only really helps with
large streaming accesses)

> When a dictionary is less sparse, it has more collisions
> which means there are more cache misses.

Only in the eventuality that a single dict access is done (or a very
small number of them compared to the number of stored elements). But if
you are doing many accesses with good temporal locality, then the
sparse dict's bigger size implies a bigger cache footprint and
therefore a lesser efficiency - not only for the dict accesses
themselves, but for the rest of the code since more data will get
evicted to make place for the dict.

Furthermore, total memory consumption can be important regardless of
execution time. RAM is cheap but VMs are common where memory is much
smaller than on entire systems.

> For dictionaries large enough to require multiple resizes,
> the 4x factor cuts the number of resizes in half and makes
> each resize faster (because of few collisions).   Only the 
> growth phase is affected though.

Indeed.

> From a high-level view, I question efforts to desparsify dictionaries.
> When you have a bucket of water, the weight is in the water, not
> in the bucket itself.

But caches store whole cache lines, not individual bytes, so you do care
about the bucket's size. With 16-byte or 24-byte dict entries, and
64-byte cache lines, it is easy to see that a sparse dict could result
in a significant waste of the cache lines' storage if e.g. only 1 of 3
entries were used (and used entries were distributed in a regular
manner).

> The actual keys and values dominate dict size
> unless you're reusing the same values over and over again.

It certainly depends a lot on the use case.

Regards

Antoine.