Speed revisited
Bulba!
bulba at bulba.com
Sat Jan 8 19:34:38 EST 2005
On 4 Jan 2005 14:33:34 -0800, "John Machin" <sjmachin at lexicon.net>
wrote:
>(b) Fast forwarding 30+ years, let's look at the dictionary method,
>assuming you have enough memory to hold all your data at once:
>
>Step 1: read the "left" table; for each row, if english not in mydict,
>then do mydict[english] = MyObject(). In any case, do
>mydict[english].left_list.append(row)
>Step 2: same for the "right" table.
>Step 3: for english, obj in mydict.iteritems(): process(english, obj)
>As your datasets are stored in MS Excel spreadsheets, N < 64K so
>whether your solution is O(N) or O(N*log(N)) doesn't matter too much.
>You are however correct to avoid O(N**2) solutions.
Following advice of two posters here (thanks) I have written two
versions of the same program, and both of them work, but the
difference in speed is drastic, about 6 seconds vs 190 seconds
for about 15000 of processed records, taken from 2 lists of
dictionaries.
I've read "Python Performance Tips" at
http://manatee.mojam.com/~skip/python/fastpython.html
..but still don't understand why the difference is so big.
Both versions use local variables, etc. Both have their
lists initially sorted. Both essentially use a loop with
conditional for comparison, then process the record in the
same way. The overhead of second version is that it also
uses cmp() function and two additional integer
variables - that should not slow the program _so much_.
I have measured the run of snippet 2 with time checkpoints
written to a log (write time delta to log every 200 records),
even made a graph of time deltas in spreadsheet and in fact
snippet 2 seems after initial slowdown looks exactly linear,
like that:
^ (time)
|
| /-----------
| /
|/
---------------> (# of records written)
So yes, it would scale to big files. However, why is it so
frigging slow?!
# snippet 1, this runs in about 6 seconds
!def prepend(l):
! map = {}
! for d in l:
! key = d['English']
! map[key] = d
! return map
!
!old_map = prepend(oldl)
!new_map = prepend(newl)
!
!for engphrase in old_map:
! if engphrase in new_map:
! o = old_map[engphrase]
! n = new_map[engphrase]
! cm.writerow(matchpol(o,n))
# snippet 2, this needs 190 seconds
!while 1:
! if len(oldl) == 0 or len(newl) == 0:
! break
! if oldl[o]['English'] == newl[n]['English']:
! cm.writerow(matchpol(oldl[o], newl[n]))
! del oldl[o]
! del newl[n]
! o, n = 0, 0
! continue
! elif cmp(oldl[o]['English'], newl[n]['English']) < 0:
! if o == len(oldl):
! cm.writerow(newl[0])
! del(newl[0])
! o, n = 0, 0
! continue
! o+=1
! elif cmp(oldl[o]['English'], newl[n]['English']) > 0:
! if n == len(newl):
! cm.writerow(newl[0])
! del(oldl[0])
! o, n = 0, 0
! continue
! n+=1
--
It's a man's life in a Python Programming Association.
More information about the Python-list
mailing list