Adding tuples to a dictionary

Peter Otten __peter__ at web.de
Thu May 31 14:42:33 EDT 2007


Maciej Blizi?ski wrote:

> Hi Pythonistas!
> 
> I've got a question about storing tuples in a dictionary. First, a
> small test case which creates a list of dictionaries:
> 
> import time
> 
> list_of_dicts = []
> keys = [str(x) for x in range(200000)]
> prev_clk = time.clock()
> for i in range(20):
>     my_dict = {}
>     for key in keys:
>         my_dict[key] = key
>         list_of_dicts.append(my_dict)
>     new_clk = time.clock()
>     print i, new_clk - prev_clk
>     prev_clk = new_clk
> 
> It creates dictionaries and stores them in a list, printing out
> execution times. The size of each dictionary is constant, so is the
> execution time for each iteration.
> 
> 0 0.1
> 1 0.1
> 2 0.1
> 3 0.08
> 4 0.09
> 
> ...and so on.
> 
> Then, just one line is changed:
>         my_dict[key] = key
> into:
>         my_dict[key] = (key, key)
> 
> Full code:
> 
> list_of_dicts = []
> keys = [str(x) for x in range(200000)]
> prev_clk = time.clock()
> for i in range(20):
>     my_dict = {}
>     for key in keys:
>         my_dict[key] = (key, key)
>     list_of_dicts.append(my_dict)
>     new_clk = time.clock()
>     print i, new_clk - prev_clk
>     prev_clk = new_clk
> 
> The difference is that instead of single values, tuples are added to
> the dictionary instead. When the program is  run again:
> 
> 0 0.27
> 1 0.37
> 2 0.49
> 3 0.6
> ...
> 16 2.32
> 17 2.45
> 18 2.54
> 19 2.68
> 
> The execution time is rising linearly with every new dictionary
> created.
> 
> Next experiment: dictionaries are not stored in a list, they are just
> left out when an iteration has finished. It's done by removing two
> lines:
> 
> list_of_dicts = []
> 
> and
> 
> list_of_dicts.append(my_dict)
> 
> Full code:
> 
> keys = [str(x) for x in range(200000)]
> prev_clk = time.clock()
> for i in range(20):
>     my_dict = {}
>     for key in keys:
>         my_dict[key] = (key, key)
>     new_clk = time.clock()
>     print i, new_clk - prev_clk
>     prev_clk = new_clk
> 
> The time is constant again:
> 
> 0 0.28
> 1 0.28
> 2 0.28
> 3 0.26
> 4 0.26
> 
> I see no reason for this kind of performance problem, really. It
> happens when both things are true: dictionaries are kept in a list (or
> more generally, in memory) and they store tuples.
> 
> As this goes beyond my understanding of Python internals, I would like
> to kindly ask, if anyone has an idea about how to create this data
> structure (list of dictionaries of tuples, assuming that size of all
> dictionaries is the same), in constant time?

Disable garbage collection:

import gc
gc.disable()

It uses inappropriate heuristics in this case.

Peter



More information about the Python-list mailing list