Populating a dictionary, fast

Michael Bacarella mbac at gpshopper.com
Sat Nov 10 16:56:35 EST 2007


The id2name.txt file is an index of primary keys to strings.  They look like this:

11293102971459182412:Descriptive unique name for this record\n
950918240981208142:Another name for another record\n

The file's properties are:

# wc -l id2name.txt

8191180 id2name.txt
# du -h id2name.txt
517M    id2name.txt

I'm loading the file into memory with code like this:

id2name = {}
for line in iter(open('id2name.txt').readline,''):
    id,name = line.strip().split(':')
    id = long(id)
    id2name[id] = name

This takes about 45 *minutes*

If I comment out the last line in the loop body it takes only about 30 _seconds_ to run.
This would seem to implicate the line id2name[id] = name as being excruciatingly slow.

Is there a fast, functionally equivalent way of doing this?

(Yes, I really do need this cached.  No, an RDBMS or disk-based hash is not fast enough.)






More information about the Python-list mailing list