Time complexity of dictionary insertions
Tim Peters
tim_one at email.msn.com
Sun Apr 25 19:47:17 EDT 1999
[someone asks about the time complexity of Python dict insertions]
[Tim]
> Min O(1), Max O(N), Ave O(1). If the hash function is doing
> a terrible job (e.g. maps every key to the same hash value), make
> those all O(N).
[Moshe Zadka, gets back to the topic <wink>]
> This is interesting. What is the WCS behaviour of Python dicts?
"If the hash function is doing a terrible job (e.g. maps every key to the
same hash value), make those all O(N)." That implies O(N**2) for a sequence
of N inserts.
> but-it-doesn't-really-matter-'cause-it-takes-finite-time-anyway-ly y'rs,
> Z.
Oh, it matters a lot! I'll attach a test case, showing how to consume 4
minutes with about 2000 measily inserts. Python's tuple hash function once
did a very poor job on int 2-tuples of the form (i, i+-1), which made
marching over diagonals of sparse 2D arrays (represented as dicts) a major
exercise in faith <wink>.
if-it-weren't-for-probability-we'd-all-be-dead-ly y'rs - tim
Output from a careless (non-quiet) run, 1.5.2, Win95, P5-166:
Timings for horrid = 0
total # dict entries; elapsed time; ratio to last elapsed
1 0.00
2 0.00 1.17
4 0.00 1.23
8 0.00 1.36
16 0.00 2.46
32 0.00 1.29
64 0.01 1.46
128 0.01 1.58
256 0.02 1.76
512 0.03 1.94
1024 0.06 1.91
2048 0.11 1.96
Timings for horrid = 1
total # dict entries; elapsed time; ratio to last elapsed
1 0.00
2 0.00 2.91
4 0.00 3.35
8 0.00 3.81
16 0.02 4.14
32 0.06 3.90
64 0.28 4.76
128 1.20 4.25
256 3.91 3.26
512 14.66 3.75
1024 57.85 3.95
2048 231.16 4.00
Code:
import time
class Horrid:
N = 0
def __init__(self, horrid=1):
self.value = Horrid.N
Horrid.N = Horrid.N + 1
self.horrid = horrid
def __hash__(self):
if self.horrid:
return 42
else:
return hash(self.value)
def __cmp__(self, other):
return cmp(self.value, other.value)
MAX = 2**11
def timeit(horrid):
clock = time.clock
d = {}
elapsed = 0.0
numtoadd = 1
while 1:
if numtoadd + len(d) > MAX:
break
stufftoadd = map(Horrid, [horrid] * numtoadd)
start = clock()
for thing in stufftoadd:
d[thing] = 1
finish = clock()
lastelapsed = elapsed
elapsed = elapsed + (finish - start)
print "%7d %8.2f" % (len(d), elapsed),
if lastelapsed:
print "%5.2f" % (elapsed / lastelapsed)
else:
print
numtoadd = len(d)
for horrid in 0, 1:
print "\nTimings for horrid =", horrid
print "total # dict entries; elapsed time; ratio to last elapsed"
timeit(horrid)
More information about the Python-list
mailing list