Orders of magnitude
Robert Brewer
fumanchu at amor.org
Mon Mar 29 01:47:00 EST 2004
x = [chr(a) + chr(b) for a in xrange(100) for b in xrange(100)]
# list version: c = []
for i in x:
if i not in c:
c.append(i)
>>> t1.timeit(1)
2.0331145438875637
# dict version: c = {}
for i in x:
if i not in c:
c[i] = None
>>> t2.timeit(1)
0.0067952770534134288
# bsddb version: c = bsddb.btopen(None)
for i in x:
if i not in c:
c[i] = None
>>> t3.timeit(1)
0.18430750276922936
Wow. Dicts are *fast*.
I'm dedup'ing a 10-million-record dataset, trying different approaches
for building indexes. The in-memory dicts are clearly faster, but I get
Memory Errors (Win2k, 512 MB RAM, 4 G virtual). Any recommendations on
other ways to build a large index without slowing down by a factor of
25?
Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org
More information about the Python-list
mailing list