[Numpy-discussion] unique 2d arrays

Ernest Adrogué eadrogue at gmx.net
Tue Sep 21 14:35:00 EDT 2010


21/09/10 @ 12:55 (-0500), thus spake Gökhan Sever:
> On Tue, Sep 21, 2010 at 12:43 PM, <josef.pktd at gmail.com> wrote:
> 
> > I'm a bit surprised, I think np.unique does some extra work to
> > maintain the order.
> > The tolist() might not be necessary if you iterate over rows.
> >
> 
> Testing again with a smaller k array and more repeats
> 
> I[25]: k = np.array((a.tolist()*5000))
> 
> I[27]: %timeit -r 100 np.array(list(set(tuple(i) for i in k.tolist())))
> 10 loops, best of 100: 31.3 ms per loop
> 
> I[28]: %timeit -r 100 np.array(list(set(tuple(i) for i in k)))
> 10 loops, best of 100: 55.4 ms per loop
> 
> I[30]: %timeit -r 100
> np.unique(k.view([('',k.dtype)]*k.shape[1])).view(k.dtype).reshape(-1,k.shape[1])
> 10 loops, best of 100: 60.5 ms per loop
> 
> .tolist version is faster. Can you also verify this?

I get the same results:

In [14]: x=np.random.poisson(1.3, size=100000).reshape(-1,2)

In [19]: %timeit np.array(tuple(set(map(tuple, x.tolist()))))
10 loops, best of 3: 86.5 ms per loop

In [20]: %timeit np.array(tuple(set(map(tuple, x))))
10 loops, best of 3: 125 ms per loop

Bye.
Ernest



More information about the NumPy-Discussion mailing list