Size in bytes of a dictionary

Don O'Donnell donod at home.com
Wed Sep 19 13:23:51 EDT 2001


Xavier Defrang wrote:
> 
> Dear Pythoneers,
> 
> I haven't been able to find a way to get the total size in bytes of a
> dictionary object.  I'm writing an application that deals with a large
> hashtable (a string-string mapping) and I'd like to be able to monitor its
> size in memory.  Is there any heuristic formula I may use depending on the
> number of keys and the way the interpreters allocates memory for the data
> structure?
> 

David Beazley's excellent "Python Essential Reference" (first edition)
gives the following formulas in Table 3.9 Memory Size of Built-in
Datatypes:

String        20 bytes + 1 byte per character

Dictionary    24 bytes + 12*2**n bytes, n = log2(nitems)+1


This reference is somewhat dated so these formulas may no longer be
accurate.  The Second Edition may have updated these formulas, but I
haven't bought it yet.

Quoting from further on in this same reference:

"""
Dictionaries are implemented using a hash table with open indexing.  The
number of entries allocated to a dictionary is equal to twice the
smallest power of 2 that's greater than the number of objects stored in
the dictionary.  When a dictionary expands, it's size doubles.  On
average, about a half of the entries allocated to a dictionary are
unused.
"""

Also beware of automatic interning of strings.  Of course this won't be
an issue with your key strings, since they must be unique, but duplicate
value strings may use the same memory location.  I'm not sure under what
conditions auto-intern takes place but it seems to be related to the
length of the string.  Here's a little test I just ran:

>>> sa = "howdy"
>>> sb = "howdy"
>>> id(sa)
17266544
>>> id(sb)
17266544
>>> la = 'this is a very long string that may not be automatically interned'
>>> lb = 'this is a very long string that may not be automatically interned'
>>> id(la)
17358432
>>> id(lb)
17366928

Maybe someone else in this group would contribute some light here.

Hope this helps.

Cheers,
Don



More information about the Python-list mailing list