Adding tuples to a dictionary
Nick Craig-Wood
nick at craig-wood.com
Fri Jun 1 07:30:04 EDT 2007
bvukov at teletrader.com <bvukov at teletrader.com> wrote:
> Let me comment on what happens in you're code:
> The place where you create new objects is
> keys = [str(x) for x in range(200000)] # here you create 200000
> strings which will be reused ( by reference )
> and
> my_dict[key] = (key, key) # here you create a new tuple with 2
> elements
> ( both are key, so you're taking a
> reference of existing key object twice )
> The tricky part is where you wrote:
> for key in keys:
> my_dict[key] = (key, key)
> list_of_dicts.append(my_dict) # note that
> list_of_dicts.append is in the loop! check upstairs!
> This means that my_dict reference will be stored 200000 times, and it
> won't be released.
> statement
> my_dict = {}
> will always create new my_dict ( 20 times means 20 new dictionaries )
> and start over.
> Since python caches free dictionaries ( after delete - they're used
> everywhere ),
> reuse won't happen, and memory will have to be allocated again.
[snip]
> The reason why you have a growing time comes from the fact that memory
> allocation takes place instead of object
> being used by reference. Check the memory usage, and you'll see that
> test time is pretty much proportional to overall memory usage.
That agrees with my analysis also - it is all about memory usage due
to the duplicated (key, key) tuples. In fact it puts my machine into
swap at the larger iteration numbers!
Here is storing them in a cache which runs nice and fast!
import time
list_of_dicts = []
keys = [str(x) for x in range(200000)]
tuple_cache = {}
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
value = tuple_cache.setdefault((key, key), (key, key))
my_dict[key] = value
list_of_dicts.append(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
0 1.0
1 0.39
2 0.41
3 0.39
4 0.39
5 0.4
6 0.41
7 0.39
8 0.4
9 0.4
10 0.39
11 0.4
12 0.4
13 0.39
14 0.4
15 0.4
16 0.39
17 0.4
18 0.39
19 0.41
Note the first iteration is slower as it builds the tuple cache
--
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list
mailing list