Numpy slow at vector cross product?

Tue Nov 22 00:14:14 EST 2016

On Tuesday 22 November 2016 14:00, Steve D'Aprano wrote:

> Running a whole lot of loops can, sometimes, mitigate some of that
> variation, but not always. Even when running in a loop, you can easily get
> variation of 10% or more just at random.

I think that needs to be emphasised: there's a lot of random noise in these 
measurements.

For big, heavyweight functions that do a lot of work, the noise is generally a 
tiny proportion, and you can safely ignore it. (At least for CPU bound tasks: 
I/O bound tasks, the noise in I/O is potentially very high.)

For really tiny operations, the noise *may* be small, depending on the 
operation.  But small is not insignificant. Consider a simple operation like 
addition:

# Python 3.5
import statistics
from timeit import Timer
t = Timer("x + 1", setup="x = 0")
# ten trials, of one million loops each
results = t.repeat(repeat=10)
best = min(results)
average = statistics.mean(results)
std_error = statistics.stdev(results)/statistics.mean(results)

Best: 0.09761243686079979
Average: 0.0988507878035307
Std error: 0.02260956789268462

So this suggests that on my machine, doing no expensive virus scans or 
streaming video, the random noise in something as simple as integer addition is 
around two percent.

So that's your baseline: even simple operations repeated thousands of times 
will show random noise of a few percent.

Consequently, if you're doing one trial (one loop of, say, a million 
operations):

start = time.time()
for i in range(1000000):
    x + 1
elapsed = time.time() - start

and compare the time taken with another trial, and the difference is of the 
order of a few percentage points, then you have *no* reason to believe the 
result is real. You ought to repeat your test multiple times -- the more the 
better.

timeit makes it easy to repeat your tests. It automatically picks the best 
timer for your platform and avoid serious gotchas from using the wrong timer. 
When called from the command line, it will automatically select the best number 
of loops to ensure reliable timing, without wasting time doing more loops than 
needed.

timeit isn't magic. It's not doing anything that you or I couldn't do by hand, 
if we knew we should be doing it, and if we could be bothered to run multiple 
trials and gather statistics and keep a close eye on the deviation between 
measurements. But who wants to do that by hand?

-- 
Steven
299792.458 km/s — not just a good idea, it’s the law!