Interesting timing issue I noticed

Wed Apr 16 15:06:55 EDT 2008

On Wed, Apr 16, 2008 at 2:54 PM, Gabriel Genellina
<gagsl-py2 at yahoo.com.ar> wrote:
> En Wed, 16 Apr 2008 10:36:14 -0300, Jonathan Shao <jzshao1 at gmail.com>
>  escribió:
>  > *Gabriel Genellina* gagsl-py2 at yahoo.com.ar
>  > <python-list%40python.org?Subject=Interesting%20timing%20issue%20I%20noticed&In-Reply-To=>
>  > *Wed Apr 16 08:44:10 CEST 2008*
>
> >> Another thing would be to rearrange the loops so the outer one executes
>  > less times; if you know that borderX<<sizeX and borderY<<sizeY it may be
>  > better to swap the inner and outer loops above.
>  > Thank you for the tip on xrange.
>  > Even if I swap the inner and outer loops, I would still be doing the same
>  > number of computations, am I right (since I still need to go through the
>  > same number of elements)? I'm not seeing how a loop swap would lead to
>  > fewer
>  > computations, since I still need to calculate the outer rim of elements
>  > in
>  > the array (defined by borderX and borderY).
>
>  You minimize the number of "for" statements executed, and the number of
>  xrange objects created. Both take some time in Python.
>
>  <code>
>  import timeit
>
>  f1 = """
>  for i in xrange(10):
>    for j in xrange(1000):
>      i,j
>  """
>
>  f2 = """
>  for j in xrange(1000):
>    for i in xrange(10):
>      i,j
>  """
>
>  print timeit.Timer(f1).repeat(number=1000)
>  print timeit.Timer(f2).repeat(number=1000)
>  </code>
>
>  Output:
>  [2.0405478908632233, 2.041863979919242, 2.0397852240997167]
>  [2.8623411634718821, 2.8330055914927783, 2.8361752680857535]
>
>  The second (using the largest outer loop) is almost 40% slower than the
>  first one. "Simplified" Big-Oh complexity analysis isn't enough in cases
>  like this.
>
>  --
>  Gabriel Genellina
>
>  --
>  http://mail.python.org/mailman/listinfo/python-list
>

For large multidimensional arrays you may also see speed differences
depending on traversal order due to cache effects.  For instance, if
the arrays are stored row-major, then processing an array a row at a
time means you're getting a bunch of memory accesses contiguous in
memory (so the cache loading a line at a time means you'll have
several hits per one load), while accessing it by column means you'll
probably have  to go out to memory a lot (depending on whether the
hardware has a prefetcher or how good it is, I suppose).

Just something to consider.