Numpy slow at vector cross product?

Mon Nov 21 20:45:50 EST 2016

On 21/11/2016 14:50, Steve D'Aprano wrote:
> On Mon, 21 Nov 2016 11:09 pm, BartC wrote:

> Modern machines run multi-tasking operating systems, where there can be
> other processes running. Depending on what you use as your timer, you may
> be measuring the time that those other processes run. The OS can cache
> frequently used pieces of code, which allows it to run faster. The CPU
> itself will cache some code.

You get to know after while what kinds of processes affect timings. For 
example, streaming a movie at the same time. So when you need to compare 
timings, you turn those off.

> The shorter the code snippet, the more these complications are relevant. In
> this particular case, we can be reasonably sure that the time it takes to
> create a list range(10000) and the overhead of the loop is *probably* quite
> a small percentage of the time it takes to perform 100000 vector
> multiplications. But that's not a safe assumption for all code snippets.

Yes, it was one of those crazy things that Python used to have to do, 
creating a list of N numbers just in order to be able to count to N.

But that's not significant here. Either experience, or a preliminary 
test with an empty loop, or using xrange, or using Py3, will show that 
the loop overheads for N iterations in this case are small in comparison 
to executing the bodies of the loops.

> This is why the timeit module exists: to do the right thing when it matters,
> so that you don't have to think about whether or not it matters. The timeit
> module works really really hard to get good quality, accurate timings,
> minimizing any potential overhead.
>
> The timeit module automates a bunch of tricky-to-right best practices for
> timing code. Is that a problem?

The problem is it substitutes a bunch of tricky-to-get-right options and 
syntax which has to to typed /at the command line/. And you really don't 
want to have to write code at the command line (especially if sourced 
from elsewhere, which means you have to transcribe it).

> But if you prefer doing it "old school" from within Python, then:
>
> from timeit import Timer
> t = Timer('np.cross(x, y)',  setup="""
> import numpy as np
> x = np.array([1, 2, 3])
> y = np.array([4, 5, 6])
> """)
>
> # take five measurements of 100000 calls each, and report the fastest
> result = min(t.repeat(number=100000, repeat=5))/100000
> print(result)  # time in seconds per call

> Better?

A bit, but the code is now inside a string!

Code will normally exist as a proper part of a module, not on the 
command line, in a command history, or in a string, so why not test it 
running inside a module?

But I've done a lot of benchmarking and actually measuring execution 
time is just part of it. This test I ran from inside a function for 
example, not at module-level, as that is more typical.

Are the variables inside a time-it string globals or locals? It's just a 
lot of extra factors to worry about, and extra things to get wrong.

The loop timings used by the OP showed one took considerably longer than 
the other. And that was confirmed by others. There's nothing wrong with 
that method.

-- 
Bartc