An IMHO interesting optimization question...

Skip Montanaro skip at pobox.com
Wed Apr 24 20:08:53 EDT 2002


    Mickey> =-------------------------------------------------=
    Mickey> # main loop. iterates over standard input as long as there's
    Mickey> # something coming in...
    Mickey> lines = sys.stdin.readlines()
    Mickey> for line in lines:
    Mickey>     for word in wordExp.findall( line ):
    Mickey>         count( word )
    Mickey> =-------------------------------------------------=

    Mickey> This takes 14 seconds on my machine which seems to be quite an
    Mickey> achievement.

    Mickey> Now on to the next. I try to eliminate the for loop by
    Mickey> substituting with a map statement, which is supposed to be
    Mickey> faster (because it's looping in C). The 3rd version looks like:

    Mickey> =-------------------------------------------------=
    Mickey> # main loop. iterates over standard input as long as there's
    Mickey> # something coming in...
    Mickey> lines = sys.stdin.readlines()
    Mickey> for line in lines:
    Mickey>     map( count, wordExp.findall( line ) )
    Mickey> =-------------------------------------------------=

    Mickey> Now this takes 17 seconds on my machine. Wow? Why that? Why does
    Mickey> it take longer?

The cost to run a loop is the startup/tear down cost plus the cost of
running all the iterations.  I believe the startup/tear down cost of map()
is fairly high, while it's per iteration cost is low (high and low being
relative to the cost of the same work using a for loop).  You have to
amortize that startup cost over a large number of iterations for it to pay
off.  My guess is that your wordExp.findall() call only returns a list of at
most a few elements.

You might try just using map to replace the outer loop.  That would
necessitate an interface change to count().  It would then expect a line and
run the inner loop itself.

-- 
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)





More information about the Python-list mailing list