Why is this so much faster?

Tim Delaney timothy.c.delaney at gmail.com
Thu Jun 2 18:38:31 EDT 2011


On 3 June 2011 08:07, Keir Rice <keirrice at gmail.com> wrote:

> Hi All,
>
> The following function was showing up in my profiles as a large bottle
> neck:
>
> # Slow version
> def RMSBand(self, histogram):
>        """Calculates the root-mean-squared value for the given colour
> stream histogram."""
>        intermediateResult = map(lambda (i, h): h*(i**2), zip(histogram,
> range(255)))
>        totalSum = sum(intermediateResult)
>
>        # calculate rms
>        return math.sqrt(totalSum / self.Area())
>
> So after a bit of trial and error I came up with the following function
> which is a lot faster:
>
> # Fast version
> def RMSBand(self, histogram):
>        """Calculates the root-mean-squared value for the given colour
> stream histogram."""
>        totalSum = 0
>        for i, h in enumerate(histogram):
>                totalSum += h*(i**2)
>
>        # calculate rms
>        return math.sqrt(totalSum / self.Area())
>
> My question is why is the second method so much faster?
> Is it the removal of the extra function calls?
> Is it skipping the creation of a list?
> Do the basic arithmetic operators get pushed up into C code?
>
> Any insights into the whats really happening behind the scenes would be
> appreciated.
>

First of all, do you have the parameters to the lambda the wrong way around
in the map() version? zip(histogram, range(255)) will return (histogram
value, index), but enumerate(histogram) will return (index, histogram
value). But the parameters haven't been swapped, resulting in the histogram
value being squared in the map version, but the index being squared in the
manual summing version. Depending on the values, this could result in a
large performance increase in the second version (if the total value exceeds
the maximum size of your platform's "long" type).

Ignoring that (though I've set the parameters below to how I think they
*should* be) ...

Probably the biggest savings are list creating and jumping between C- and
Python-functions during the map call. The lambda is a Python function, which
are notoriously slow to use from within map() in comparison to keeping it
all as C-level. Creating intermediate lists is totally unnecessary and
obviously a waste of time. You're actually creating 3 intermediate lists -
one with map(), one with zip() and one with range() (the third only if this
is Python 2.x code, not 3.x).

The best way to find out exactly where the savings are would be to eliminate
one piece of it at a time and see where the savings are (using the timeit
module). For example, you can use a list comprehension to get the same
effect as map() without jumping in and out of python code:

# Use list comprehension instead of map()
def RMSBand(self, histogram):
"""Calculates the root-mean-squared value for the given colour stream
histogram."""
intermediateResult = [h*(i**2) for (h, i) in zip(histogram, range(255))]
totalSum = sum(intermediateResult)

# calculate rms
return math.sqrt(totalSum / self.Area())

# Use itertools.izip() instead of zip()
def RMSBand(self, histogram):
"""Calculates the root-mean-squared value for the given colour stream
histogram."""
        intermediateResult = map(lambda (h, i): h*(i**2),
itertools.izip(histogram, range(255)))
totalSum = sum(intermediateResult)

# calculate rms
return math.sqrt(totalSum / self.Area())

# Use xrange() instead of range()
def RMSBand(self, histogram):
"""Calculates the root-mean-squared value for the given colour stream
histogram."""
        intermediateResult = map(lambda (h, i): h*(i**2), zip(histogram,
xrange(255)))
totalSum = sum(intermediateResult)

# calculate rms
return math.sqrt(totalSum / self.Area())


# Use enumerate() instead of zip(range())
def RMSBand(self, histogram):
"""Calculates the root-mean-squared value for the given colour stream
histogram."""
        intermediateResult = map(lambda (h, i): h*(i**2),
enumerate(histogram))
totalSum = sum(intermediateResult)

# calculate rms
return math.sqrt(totalSum / self.Area())

BTW, the following might (or might not) be faster again:

# Pass a generator expression to sum() instead of manually summing
def RMSBand(self, histogram):
"""Calculates the root-mean-squared value for the given colour stream
histogram."""
totalSum = sum(h*(i**2) for (i, h) in enumerate(histogram))

# calculate rms
return math.sqrt(totalSum / self.Area())

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110603/8fa49259/attachment-0001.html>


More information about the Python-list mailing list