[Python-Dev] Idea: more compact, interned string key only dict for namespace.

Mark Shannon mark at hotpy.org
Wed Jun 22 21:30:01 EDT 2016


Hi all,

I think we need some more data before going any further reimplementing 
dicts.

What I would like to know is, across a set of Python programs (ideally a 
representative set), what the proportion of dicts in memory at any one 
time are:

a) instance dicts
b) other namespace dicts (classes and modules)
c) data dicts with all string keys
d) other data dicts
e) keyword argument dicts (I'm guessing this is vanishingly small)

I would expect that (a) far exceeds (b) and depending on the application 
also considerably exceeds (c), but I would like some real data.
 From that we can compute the (approximate) memory costs of the 
competing designs.

As an aside, if anyone is really keen to save memory, then removing the 
cycle GC header is the thing to do.
That uses 24 bytes per object and *half* of all live objects have it.
And don't forget that any Python object is really two objects, the 
object and its dict, so that is 48 extra bytes every time you create a 
new object.


Cheers,
Mark.

On 22/06/16 10:23, INADA Naoki wrote:
> As my last email, compact ordered dict can't preserve
> insertion order of key sharing dict (PEP 412).
>
> I'm thinking about deprecating key shared dict for now.
>
> Instead, my new idea is introducing more compact dict
> specialized for namespace.
>
> If BDFL (or BDFL delegate) likes this idea, I'll take another
> one week to implement this.
>
>
> Background
> ----------------
>
> * Most keys of namespace dict are string.
> * Calculating hash of string is cheap (one memory access, thanks for cache).
> * And most keys are interned already.
>
>
> Design
> ----------
>
> Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.
>
> typedef struct {
>      // no me_hash
>      PyObject *me_key, *me_value;
> } PyInternedKeyEntry;
>
>
> insertdict() interns key if it's unicode, otherwise it converts dict to
> normal compact ordered dict.
>
> lookdict_interned() compares only pointer (doesn't call unicode_eq())
> when searching key is interned.
>
> And add new internal API to create interned key only dict.
>
> PyDictObject* _PyDict_NewForNamespace();
>
>
> Memory usage
> --------------------
>
> on amd64 arch.
>
> key-sharing dict:
>
> * 96 bytes for ~3 items
> * 128 bytes for 4~5 items.
>
> compact dict:
>
> * 224 bytes for ~5 items.
>
> (232 bytes when keep supporting key-shared dict)
>
> interned key only dict:
>
> * 184 bytes for ~5 items
>
>
> Note
> ------
>
> Interned key only dict is still larger than key-shared dict.
>
> But it can be used for more purpose.  It can be used for interning string
> for example.  It can be used to kwargs dict when all keys are interned already.
>
> If we provide _PyDict_NewForNamespace to extension modules,
> json decoder can have option to use this, too.
>
>


More information about the Python-Dev mailing list