Better dict of dicts

"Martin v. Löwis" martin at v.loewis.de
Thu Apr 19 17:43:40 EDT 2007


Bill Jackson schrieb:
> I have a dictionary of dictionaries where the keys are typically very
> long tuples and repeated in each inner dictionary. 

What I don't understand here: you say the keys are tuples, yet later,
you show that the keys are strings. Which one is it?

> The dictionary
> representation is nice because it handles sparseness well...and it is
> nice to be able to look up values based on a string rather than a
> number.  However, since my keys are quite long, I worry that I am
> wasting a lot of memory.  I'm looking for better data structures.
> 
> Here is an example:
> 
>>>> a = {"string_1": {"string_2":1,
> ...                   "string_3":5,
> ...                   "string_4":10},
> ...      "string_2": {"string_2":12,
> ...                   "string_6":2,
> ...                   "string_1":4}}
> 
> So as my strings get longer and longer, it seems that the dictionary of
> dictionary representation is less and less efficient.

You seem to assume that every occurrence of "string_2" has separate
memory. That may or may not be the case, depending on where the strings
come from. For example, it may be that the keys are the very same strings:

s1 = "string_1"
s2 = "string_2"

a[s1][s2] = 1
a[s2][s1] = 4

Then, the memory for the string would exist only once.

> My thought was to subclass the dictionary structure....
> 
> keys = {"string_1":1,
>         "string_2":2,
>         "string_3":3,
>         "string_4":4,
>         "string_6":5}

Instead of doing that, I would use a procedure called "interning".
You may want to use the builtin intern function, or your can
come up with your own interning:

interns = {}
def local_intern(s):
  return interns.setdefault(s, s)

Then, instead of

a[k1][k2] = value

do

a[local_intern(k1)][local_intern(k2)] = value

then all strings occur only once as keys, as long as the interns
dictionary isn't cleared.

> Any ideas?

HTH,
Martin



More information about the Python-list mailing list