Populating a dictionary, fast [SOLVED SOLVED]

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Thu Nov 15 17:10:02 EST 2007


On Thu, 15 Nov 2007 21:51:21 +0100, Hrvoje Niksic wrote:

> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
> 
>>>> Someone please summarize.
>>> 
>>> Yes, that would be good.
>>
>> On systems with multiple CPUs or 64-bit systems, or both, creating
>> and/or deleting a multi-megabyte dictionary in recent versions of
>> Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+
>> minutes, compared to seconds if the system only has a single CPU.
> 
> Can you post minimal code that exhibits this behavior on Python 2.5.1?
> The OP posted a lot of different versions, most of which worked just
> fine for most people.

Who were testing it on single-CPU, 32 bit systems.

The plot thickens... I wrote another version of my test code, reading the 
data into a list of tuples rather than a dict:

$ python slurp_dict4.py  # actually slurp a list, despite the name
Starting at Fri Nov 16 08:55:26 2007
Line 0
Line 1000000
Line 2000000
Line 3000000
Line 4000000
Line 5000000
Line 6000000
Line 7000000
Line 8000000
Items in list: 8191180
Completed import at Fri Nov 16 08:56:26 2007
Starting to delete list...
Completed deletion at Fri Nov 16 08:57:04 2007
Finishing at Fri Nov 16 08:57:04 2007

Quite a reasonable speed, considering my limited memory.

What do we know so far?

(1) The problem occurs whether or not gc is enabled.

(2) It only occurs on some architectures. 64 bit CPU seems to be common 
factor.

(3) I don't think we've seen it demonstrated under Windows, but we've 
seen it under at least two different Linux distros.

(4) It affects very large dicts, but not very large lists.

(5) I've done tests where instead of one really big dict, the data is put 
into lots of smaller dicts. The problem still occurs.

(6) It was suggested the problem is related to long/int unification, but 
I've done tests that kept the dict keys as strings, and the problem still 
occurs.

(7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1.

Do we treat this as a solved problem and move on?



-- 
Steven.



More information about the Python-list mailing list