[Numpy-discussion] Generating random samples without repeats

Fri Sep 19 11:08:39 EDT 2008

Rick White <rlw <at> stsci.edu> writes:

> It seems like numpy.random.permutation is pretty suboptimal in its  
> speed.  Here's a Python 1-liner that does the same thing (I think)  
> but is a lot faster:
> 
> a = 1+numpy.random.rand(M).argsort()[0:N-1]
> 
> This still has the the problem that it generates a size N array to  
> start with.  But at least it is fast compared with permutation:

Interesting. For my generation of a million samples, this takes about 46 sec 
vs the original 75. That's a 35% increase in speed. As you mention, it doesn't 
help memory, which still peaks at around 450M.

Interestingly, I was reminded of J (http://www.jsoftware.com/), an APL 
derivative, which does this in a blistering 1.3 seconds, with no detectable 
memory overhead. Of course, being descended from APL, the code to do this is 
pretty obscure:

    5 ? (1000000 $ 52)

(Here, ? is the "deal" operator, and $ reshapes an array - so it's "deal 5 
from each item in a 1000000-long array of 52's". Everything is a primitive 
here, so it's not hard to see why it's fast).

A Python/Numpy <-> J bridge might be a fun exercise...

Paul.