Time complexity of dictionary insertions

Tim Peters tim_one at email.msn.com
Sun Apr 25 19:47:17 EDT 1999


[someone asks about the time complexity of Python dict insertions]
[Tim]
> Min O(1), Max O(N), Ave O(1).  If the hash function is doing
> a terrible job (e.g. maps every key to the same hash value), make
> those all O(N).

[Moshe Zadka, gets back to the topic <wink>]
> This is interesting. What is the WCS behaviour of Python dicts?

"If the hash function is doing a terrible job (e.g. maps every key to the
same hash value), make those all O(N)."  That implies O(N**2) for a sequence
of N inserts.

> but-it-doesn't-really-matter-'cause-it-takes-finite-time-anyway-ly y'rs,
> Z.

Oh, it matters a lot!  I'll attach a test case, showing how to consume 4
minutes with about 2000 measily inserts.  Python's tuple hash function once
did a very poor job on int 2-tuples of the form (i, i+-1), which made
marching over diagonals of sparse 2D arrays (represented as dicts) a major
exercise in faith  <wink>.

if-it-weren't-for-probability-we'd-all-be-dead-ly y'rs  - tim

Output from a careless (non-quiet) run, 1.5.2, Win95, P5-166:

Timings for horrid = 0
total # dict entries; elapsed time; ratio to last elapsed
      1     0.00
      2     0.00  1.17
      4     0.00  1.23
      8     0.00  1.36
     16     0.00  2.46
     32     0.00  1.29
     64     0.01  1.46
    128     0.01  1.58
    256     0.02  1.76
    512     0.03  1.94
   1024     0.06  1.91
   2048     0.11  1.96

Timings for horrid = 1
total # dict entries; elapsed time; ratio to last elapsed
      1     0.00
      2     0.00  2.91
      4     0.00  3.35
      8     0.00  3.81
     16     0.02  4.14
     32     0.06  3.90
     64     0.28  4.76
    128     1.20  4.25
    256     3.91  3.26
    512    14.66  3.75
   1024    57.85  3.95
   2048   231.16  4.00

Code:

import time

class Horrid:
    N = 0

    def __init__(self, horrid=1):
        self.value = Horrid.N
        Horrid.N = Horrid.N + 1
        self.horrid = horrid

    def __hash__(self):
        if self.horrid:
            return 42
        else:
            return hash(self.value)

    def __cmp__(self, other):
        return cmp(self.value, other.value)

MAX = 2**11

def timeit(horrid):
    clock = time.clock
    d = {}
    elapsed = 0.0
    numtoadd = 1
    while 1:
        if numtoadd + len(d) > MAX:
            break
        stufftoadd = map(Horrid, [horrid] * numtoadd)
        start = clock()
        for thing in stufftoadd:
            d[thing] = 1
        finish = clock()
        lastelapsed = elapsed
        elapsed = elapsed + (finish - start)
        print "%7d %8.2f" % (len(d), elapsed),
        if lastelapsed:
            print "%5.2f" % (elapsed / lastelapsed)
        else:
            print
        numtoadd = len(d)

for horrid in 0, 1:
    print "\nTimings for horrid =", horrid
    print "total # dict entries; elapsed time; ratio to last elapsed"
    timeit(horrid)






More information about the Python-list mailing list