An IMHO interesting optimization question...
Skip Montanaro
skip at pobox.com
Wed Apr 24 20:08:53 EDT 2002
Mickey> =-------------------------------------------------=
Mickey> # main loop. iterates over standard input as long as there's
Mickey> # something coming in...
Mickey> lines = sys.stdin.readlines()
Mickey> for line in lines:
Mickey> for word in wordExp.findall( line ):
Mickey> count( word )
Mickey> =-------------------------------------------------=
Mickey> This takes 14 seconds on my machine which seems to be quite an
Mickey> achievement.
Mickey> Now on to the next. I try to eliminate the for loop by
Mickey> substituting with a map statement, which is supposed to be
Mickey> faster (because it's looping in C). The 3rd version looks like:
Mickey> =-------------------------------------------------=
Mickey> # main loop. iterates over standard input as long as there's
Mickey> # something coming in...
Mickey> lines = sys.stdin.readlines()
Mickey> for line in lines:
Mickey> map( count, wordExp.findall( line ) )
Mickey> =-------------------------------------------------=
Mickey> Now this takes 17 seconds on my machine. Wow? Why that? Why does
Mickey> it take longer?
The cost to run a loop is the startup/tear down cost plus the cost of
running all the iterations. I believe the startup/tear down cost of map()
is fairly high, while it's per iteration cost is low (high and low being
relative to the cost of the same work using a for loop). You have to
amortize that startup cost over a large number of iterations for it to pay
off. My guess is that your wordExp.findall() call only returns a list of at
most a few elements.
You might try just using map to replace the outer loop. That would
necessitate an interface change to count(). It would then expect a line and
run the inner loop itself.
--
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)
More information about the Python-list
mailing list