[Python-ideas] Improving memory usage in shared-key attribute dicts

Hill, Bruce bhill at ea.com
Thu Oct 30 20:02:27 CET 2014


Thanks to PEP 412: "Key-Sharing Dictionary", CPython attribute dictionaries can share keys between multiple instances, so the memory cost of new attribute dicts comes primarily from the values array. In the current implementation, the keys array and the values array are always kept to be the same size. This is done so that once the key's location in the key array has been tracked down, the same array offset can be used on the value array to find the value.

Rather than storing values in a sparse array of the same size as the keys array, it would make more sense to store values in a compact array. When a dict uses key sharing, there is an unused field in the PyDictKeyEntry struct ("me_value"), which could be repurposed to hold an index into the value array (perhaps by converting "me_value" into a payload union in PyDictKeyEntry). Since the sparse arrays in dictobject.c never use more than (2n+1)/3 of their entries, this change would reduce the memory footprint of each shared-key dict by roughly 1/3 (or more) and also improve data locality.

With the current code, the memory usage looks something like:
  On the class:
    keys = {NULL, PyDictKeyEntry(key="keyA", value=<unused>), NULL, PyDictKeyEntry(key="keyB", value=<unused>)}
  On every instance:
    values = {NULL, "value A", NULL, "value B"}
  In this example, 50% of the values array is holding NULL pointers.

With the proposed changes, it would look more like:
  On the class:
    keys = {NULL, PyDictKeyEntry(key="keyA", value_index=0), NULL, PyDictKeyEntry(key="keyB", value_index=1)}
  On every instance:
    values = {"value A", "value B"}
  Here, the memory footprint of instance dicts is cut in half because none of the array is wasted.

There's a few implementation details to work out (e.g. uncommon cases like deleting attributes in __init__), but does this seem like a good idea to anyone else?


More information about the Python-ideas mailing list