Populating a dictionary, fast [SOLVED SOLVED]

Fri Nov 16 17:20:34 EST 2007

On Fri, 16 Nov 2007 11:24:24 +0100, Hrvoje Niksic wrote:

> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
> 
>>> Can you post minimal code that exhibits this behavior on Python 2.5.1?
>>> The OP posted a lot of different versions, most of which worked just
>>> fine for most people.
>>
>> Who were testing it on single-CPU, 32 bit systems.
> 
> Still, I'd like to see a test case that fails (works slowly) for you, so
> that I (and others) can try it on different machines.  As I said, the OP
> posted several versions of his code, and tended to depend on a specific
> dataset.  A minimal test case would help more people debug it.

http://groups.google.com.au/group/comp.lang.python/msg/b33ceaf01db10a86

#!/usr/bin/python
"""Read a big file into a dict."""

import gc
import time
print "Starting at %s" % time.asctime()
flag = gc.isenabled()
gc.disable()
id2name = {}
for n, line in enumerate(open('id2name.txt', 'r')):
    if n % 1000000 == 0:
        # Give feedback.
        print "Line %d" % n
    id,name = line.strip().split(':', 1)
    id = long(id)
    id2name[id] = name
print "Items in dict:", len(id2name)
print "Completed import at %s" % time.asctime()
print "Starting to delete dict..."
del id2name
print "Completed deletion at %s" % time.asctime()
if flag:
    gc.enable()
print "Finishing at %s" % time.asctime() 

I've since tried variants where the dict keys were kept as strings, and 
it made no difference to the speed, and where the data was kept as a list 
of (key, value) tuples, which made a HUGE difference.

You should also read this post here:
http://groups.google.com.au/group/comp.lang.python/msg/2535dc213bc45f84

showing Perl running very fast on the same machine that Python was 
running like a one-legged sloth.

-- 
Steven.