[Numpy-discussion] Generating random samples without repeats
Paul Moore
pf_moore at yahoo.co.uk
Fri Sep 19 11:08:39 EDT 2008
Rick White <rlw <at> stsci.edu> writes:
> It seems like numpy.random.permutation is pretty suboptimal in its
> speed. Here's a Python 1-liner that does the same thing (I think)
> but is a lot faster:
>
> a = 1+numpy.random.rand(M).argsort()[0:N-1]
>
> This still has the the problem that it generates a size N array to
> start with. But at least it is fast compared with permutation:
Interesting. For my generation of a million samples, this takes about 46 sec
vs the original 75. That's a 35% increase in speed. As you mention, it doesn't
help memory, which still peaks at around 450M.
Interestingly, I was reminded of J (http://www.jsoftware.com/), an APL
derivative, which does this in a blistering 1.3 seconds, with no detectable
memory overhead. Of course, being descended from APL, the code to do this is
pretty obscure:
5 ? (1000000 $ 52)
(Here, ? is the "deal" operator, and $ reshapes an array - so it's "deal 5
from each item in a 1000000-long array of 52's". Everything is a primitive
here, so it's not hard to see why it's fast).
A Python/Numpy <-> J bridge might be a fun exercise...
Paul.
More information about the NumPy-Discussion
mailing list