Adding tuples to a dictionary

Nick Craig-Wood nick at craig-wood.com
Fri Jun 1 07:30:04 EDT 2007


bvukov at teletrader.com <bvukov at teletrader.com> wrote:
>  Let me comment on what happens in you're code:
>  The place where you create new objects is
>  keys = [str(x) for x in range(200000)]   # here you create 200000
>  strings which will be reused ( by reference )
>    and
>  my_dict[key] = (key, key) # here you create a new tuple with 2
>  elements
>                              ( both are key, so you're taking a
>  reference of existing key object twice )
>  The tricky part is where you wrote:
>      for key in keys:
>          my_dict[key] = (key, key)
>          list_of_dicts.append(my_dict)  # note that
>  list_of_dicts.append is in the loop! check upstairs!
>  This means that my_dict reference will be stored 200000 times, and it
>  won't be released.
>  statement
>  my_dict = {}
>  will always create new my_dict ( 20 times means 20 new dictionaries )
>  and start over.
>  Since python caches free dictionaries ( after delete - they're used
>  everywhere ),
>  reuse won't happen, and memory will have to be allocated again.
[snip]
>  The reason why you have a growing time comes from the fact that memory
>  allocation takes place instead of object
>  being used by reference. Check the memory usage, and you'll see that
>  test time is pretty much proportional to overall memory usage.

That agrees with my analysis also - it is all about memory usage due
to the duplicated (key, key) tuples.  In fact it puts my machine into
swap at the larger iteration numbers!

Here is storing them in a cache which runs nice and fast!

import time

list_of_dicts = []
keys = [str(x) for x in range(200000)]
tuple_cache = {}
prev_clk = time.clock()
for i in range(20):
    my_dict = {}
    for key in keys:
        value = tuple_cache.setdefault((key, key), (key, key))
        my_dict[key] = value
    list_of_dicts.append(my_dict)
    new_clk = time.clock()
    print i, new_clk - prev_clk
    prev_clk = new_clk

0 1.0
1 0.39
2 0.41
3 0.39
4 0.39
5 0.4
6 0.41
7 0.39
8 0.4
9 0.4
10 0.39
11 0.4
12 0.4
13 0.39
14 0.4
15 0.4
16 0.39
17 0.4
18 0.39
19 0.41

Note the first iteration is slower as it builds the tuple cache

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list