Speed revisited

Sat Jan 8 19:34:38 EST 2005

On 4 Jan 2005 14:33:34 -0800, "John Machin" <sjmachin at lexicon.net>
wrote:

>(b) Fast forwarding 30+ years, let's look at the dictionary method,
>assuming you have enough memory to hold all your data at once:
>
>Step 1: read the "left" table; for each row, if english not in mydict,
>then do mydict[english] = MyObject(). In any case, do
>mydict[english].left_list.append(row)
>Step 2: same for the "right" table.
>Step 3: for english, obj in mydict.iteritems(): process(english, obj)

>As your datasets are stored in MS Excel spreadsheets, N < 64K so
>whether your solution is O(N) or O(N*log(N)) doesn't matter too much.
>You are however correct to avoid O(N**2) solutions.

Following advice of two posters here (thanks) I have written two
versions of  the same program, and both of them work, but the
difference in speed is drastic, about 6 seconds vs 190 seconds
for about 15000 of processed records, taken from 2 lists of
dictionaries.

I've read "Python Performance Tips" at

http://manatee.mojam.com/~skip/python/fastpython.html

..but still don't understand why the difference is so big. 

Both versions use local variables, etc. Both have their
lists initially sorted. Both essentially use a loop with
conditional for comparison, then process the record in the 
same way. The overhead of second version is that it also 
uses cmp() function and two additional integer 
variables - that should not slow the program _so much_.

I have measured the run of snippet 2 with time checkpoints
written to a log (write time delta to log every 200 records),
even made a graph of time deltas in spreadsheet and in fact 
snippet 2 seems after initial slowdown looks exactly linear, 
like  that:

^ (time)
|
|  /-----------
| /
|/
---------------> (# of records written)

So yes, it would scale to big files. However, why is it so 
frigging slow?! 

# snippet 1, this runs in about 6 seconds
!def prepend(l):
!    map = {}
!    for d in l:
!        key = d['English']
!        map[key] = d
!    return map
!
!old_map = prepend(oldl)
!new_map = prepend(newl)
!
!for engphrase in old_map:
!     if engphrase in new_map:
!         o = old_map[engphrase]
!         n = new_map[engphrase]
!         cm.writerow(matchpol(o,n))

# snippet 2, this needs 190 seconds
!while 1:
!    if len(oldl) == 0 or len(newl) == 0:
!        break
!    if oldl[o]['English'] == newl[n]['English']:
!        cm.writerow(matchpol(oldl[o], newl[n]))
!        del oldl[o]
!        del newl[n]
!        o, n = 0, 0
!        continue
!    elif cmp(oldl[o]['English'], newl[n]['English']) < 0:
!        if o == len(oldl):
!            cm.writerow(newl[0])
!            del(newl[0])
!            o, n = 0, 0
!            continue
!        o+=1
!    elif cmp(oldl[o]['English'], newl[n]['English']) > 0:
!        if n == len(newl):
!            cm.writerow(newl[0])
!            del(oldl[0])
!            o, n = 0, 0
!            continue
!        n+=1    

--
It's a man's life in a Python Programming Association.