sampling without replacement

Thu Apr 17 14:56:53 EDT 2008

On Apr 17, 2:41 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> braver <delivera... at gmail.com> writes:
> > Using an array is natural here as it represents "without replacement"
> > -- we take an element by removing it from the array.  But in Python
> > it's very slow...  What approaches are there to implement a shrinking
> > array with random deletions with  the magnitude of millions of
> > elements?
>
> The obvious way is use random.sample(), is there some reason you
> don't do that?
>
> Alternatively, when you select an element a[k] from the middle of the
> array, instead of deleting that element and moving the rest down (del
> a[k]), do something like:
>
>    k = random.randint(0,len(a)-1)
>    selection = a[k]
>    a[k] = a[-1]
>    a.pop()
>
> That deletes the last element (avoiding moving them around) after
> storing it in the newly freed slot.  Of course it messes up the order
> of the array, which won't matter for selecting random elements, but
> might matter for some other reason in your program.

What about an initial random shuffle on the array and then
always pop the last element? Or for that matter, why even pop it,
why not just take succesively larger negative indexes as the
"last" element?