numpy performance and random numbers

Sat Dec 19 19:23:46 EST 2009

On Sat, 19 Dec 2009 09:02:38 -0800, Carl Johan Rehn wrote:

> Well, in Matlab I used "tic; for i = 1:1000, randn(100, 10000), end;
> toc" and in IPython i used a similar construct but with "time" instead
> of tic/(toc.

I don't know if this will make any significant difference, but for the 
record that is not the optimal way to time a function in Python due to 
the overhead of creating the loop sequence.

In Python, the typical for-loop construct is something like this:

for i in range(1, 1000)

but range isn't a funny way of writing "loop over these values", it 
actually creates a list of integers, which has a measurable cost. In 
Python 2.x you can reduce that cost by using xrange instead of range, but 
it's even better to pre-compute the list outside of the timing code. Even 
better still is to use a loop sequence like [None]*1000 (precomputed 
outside of the timing code naturally!) to minimize the cost of memory 
accesses.

The best way to perform timings of small code snippets in Python is using 
the timeit module, which does all these things for you. Something like 
this:

from timeit import Timer
t = Timer('randn(100, 10000)', 'from numpy import randn')
print min(t.repeat())

This will return the time taken by one million calls to randn. Because of 
the nature of modern operating systems, any one individual timing can be 
seriously impacted by other processes, so it does three independent 
timings and returns the lowest. And it will automatically pick the best 
timer to use according to your operating system (either clock, which is 
best on Windows, or time, which is better on Posix systems).

-- 
Steven