Numpy slow at vector cross product?

Mon Nov 21 22:00:45 EST 2016

On Tue, 22 Nov 2016 12:45 pm, BartC wrote:

> On 21/11/2016 14:50, Steve D'Aprano wrote:
>> On Mon, 21 Nov 2016 11:09 pm, BartC wrote:
> 
>> Modern machines run multi-tasking operating systems, where there can be
>> other processes running. Depending on what you use as your timer, you may
>> be measuring the time that those other processes run. The OS can cache
>> frequently used pieces of code, which allows it to run faster. The CPU
>> itself will cache some code.
> 
> You get to know after while what kinds of processes affect timings. For
> example, streaming a movie at the same time. 

Really, no.

You'll just have to take my word on this, but I'm not streaming any movies
at the moment. I don't even have a web browser running. And since I'm
running Linux, I don't have an anti-virus scanner that might have just
triggered a scan.

(But since I'm running Linux, I do have a web server, mail server, a DNS
server, cron, and about 300 other processes running, any of which might
start running for a microsecond or ten in the middle of a job.)

py> with Stopwatch():
...     x = math.sin(1.234)
...
elapsed time is very small; consider using the timeit module for
micro-timings of small code snippets
time taken: 0.007164 seconds

And again:

py> with Stopwatch():
...     x = math.sin(1.234)
...
elapsed time is very small; consider using the timeit module for
micro-timings of small code snippets
time taken: 0.000014 seconds

Look at the variation in the timing: 0.007164 versus 0.000014 second. That's
the influence of a cache, or more than one cache, somewhere. But if I run
it again:

py> with Stopwatch():
...     x = math.sin(1.234)
...
elapsed time is very small; consider using the timeit module for
micro-timings of small code snippets
time taken: 0.000013 seconds

there's a smaller variation, this time "only" 7%, for code which hasn't
changed. That's what your up against.

Running a whole lot of loops can, sometimes, mitigate some of that
variation, but not always. Even when running in a loop, you can easily get
variation of 10% or more just at random.

> So when you need to compare timings, you turn those off.
> 
>> The shorter the code snippet, the more these complications are relevant.
>> In this particular case, we can be reasonably sure that the time it takes
>> to create a list range(10000) and the overhead of the loop is *probably*
>> quite a small percentage of the time it takes to perform 100000 vector
>> multiplications. But that's not a safe assumption for all code snippets.
> 
> Yes, it was one of those crazy things that Python used to have to do,
> creating a list of N numbers just in order to be able to count to N.

Doesn't matter. Even with xrange, you're still counting the cost of looking
up xrange, passing one or more arguments to it, parsing those arguments,
creating an xrange object, and iterating over that xrange object
repeatedly. None of those things are free.

You might *hope* that the cost of those things are insignificant compared to
what you're actually interested in timing, but you don't know. And you're
resisting the idea of using a tool that is specifically designed to avoid
measuring all that overhead.

It's okay that your intuitions about the cost of executing Python code is
inaccurate. What's not okay is your refusal to listen to those who have a
better idea of what's involved.

[...]
>> The timeit module automates a bunch of tricky-to-right best practices for
>> timing code. Is that a problem?
> 
> The problem is it substitutes a bunch of tricky-to-get-right options and
> syntax which has to to typed /at the command line/. And you really don't
> want to have to write code at the command line (especially if sourced
> from elsewhere, which means you have to transcribe it).

You have to transcribe it no matter what you do. Unless you are given
correctly written timing code.

You don't have to use timeit from the command line. But you're mad if you
don't: the smaller the code snippet, the more convenient it is.

[steve at ando ~]$ python2.7 -m timeit -s "x = 257" "3*x"
10000000 loops, best of 3: 0.106 usec per loop
[steve at ando ~]$ python3.5 -m timeit -s "x = 257" "3*x"
10000000 loops, best of 3: 0.137 usec per loop

That's *brilliant* and much simpler than anything you are doing with loops
and clocks and whatnot. Its simple, straightforward, and tells me exactly
what I expected to see. (Python 3.6 will be even better.)

For the record, the reason Python 3.5 is so much slower here is because it
is a debugging build.

>> But if you prefer doing it "old school" from within Python, then:
>>
>> from timeit import Timer
>> t = Timer('np.cross(x, y)',  setup="""
>> import numpy as np
>> x = np.array([1, 2, 3])
>> y = np.array([4, 5, 6])
>> """)
>>
>> # take five measurements of 100000 calls each, and report the fastest
>> result = min(t.repeat(number=100000, repeat=5))/100000
>> print(result)  # time in seconds per call
> 
>> Better?
> 
> A bit, but the code is now inside a string!

As opposed to source code, which is... a string.

> Code will normally exist as a proper part of a module, not on the
> command line, in a command history, or in a string, so why not test it
> running inside a module?

Sure, you can do that, if you want potentially inaccurate results.

> But I've done a lot of benchmarking and actually measuring execution
> time is just part of it. This test I ran from inside a function for
> example, not at module-level, as that is more typical.
> 
> Are the variables inside a time-it string globals or locals? It's just a
> lot of extra factors to worry about, and extra things to get wrong.
> 
> The loop timings used by the OP showed one took considerably longer than
> the other. And that was confirmed by others. There's nothing wrong with
> that method.

In this specific example, the OP is comparing two radically different pieces
of code that clearly and obviously perform differently. He's doing the
equivalent of timing the code with his heartbeat, and getting 50 beats for
one and 150 beats for the other. That's good enough to show gross
differences in performance.

But often you're comparing two code snippets which are very nearly the same,
and trying to tease out a real difference of (say) 3% out of a noisy signal
where each run may differ by 10% just from randomness. Using your heartbeat
to time code is not going to do it.

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.