Interesting timing issue I noticed
Dan Upton
upton at virginia.edu
Wed Apr 16 15:06:55 EDT 2008
On Wed, Apr 16, 2008 at 2:54 PM, Gabriel Genellina
<gagsl-py2 at yahoo.com.ar> wrote:
> En Wed, 16 Apr 2008 10:36:14 -0300, Jonathan Shao <jzshao1 at gmail.com>
> escribió:
> > *Gabriel Genellina* gagsl-py2 at yahoo.com.ar
> > <python-list%40python.org?Subject=Interesting%20timing%20issue%20I%20noticed&In-Reply-To=>
> > *Wed Apr 16 08:44:10 CEST 2008*
>
> >> Another thing would be to rearrange the loops so the outer one executes
> > less times; if you know that borderX<<sizeX and borderY<<sizeY it may be
> > better to swap the inner and outer loops above.
> > Thank you for the tip on xrange.
> > Even if I swap the inner and outer loops, I would still be doing the same
> > number of computations, am I right (since I still need to go through the
> > same number of elements)? I'm not seeing how a loop swap would lead to
> > fewer
> > computations, since I still need to calculate the outer rim of elements
> > in
> > the array (defined by borderX and borderY).
>
> You minimize the number of "for" statements executed, and the number of
> xrange objects created. Both take some time in Python.
>
> <code>
> import timeit
>
> f1 = """
> for i in xrange(10):
> for j in xrange(1000):
> i,j
> """
>
> f2 = """
> for j in xrange(1000):
> for i in xrange(10):
> i,j
> """
>
> print timeit.Timer(f1).repeat(number=1000)
> print timeit.Timer(f2).repeat(number=1000)
> </code>
>
> Output:
> [2.0405478908632233, 2.041863979919242, 2.0397852240997167]
> [2.8623411634718821, 2.8330055914927783, 2.8361752680857535]
>
> The second (using the largest outer loop) is almost 40% slower than the
> first one. "Simplified" Big-Oh complexity analysis isn't enough in cases
> like this.
>
> --
> Gabriel Genellina
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
For large multidimensional arrays you may also see speed differences
depending on traversal order due to cache effects. For instance, if
the arrays are stored row-major, then processing an array a row at a
time means you're getting a bunch of memory accesses contiguous in
memory (so the cache loading a line at a time means you'll have
several hits per one load), while accessing it by column means you'll
probably have to go out to memory a lot (depending on whether the
hardware has a prefetcher or how good it is, I suppose).
Just something to consider.
More information about the Python-list
mailing list