[Numpy-discussion] improving arraysetops

Sun Jun 14 18:40:50 EDT 2009

Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:

> 
> Hi,
> 
> I am starting a new thread, so that it reaches the interested people.
> Let us discuss improvements to arraysetops (array set operations) at [1] 
> (allowing non-unique arrays as function arguments, better naming 
> conventions and documentation).
> 
> r.
> 
> [1] http://projects.scipy.org/numpy/ticket/1133
> 

Hi,

These changes looks good to me.  For point (1) I think we should fold the 
unique and _nu code into a single function. For point (3) I like in1d - it's 
shorter than isin1d but is still clear.

What about merging unique and unique1d?  They're essentially identical for an 
array input, but unique uses the builtin set() for non-array inputs and so is 
around 2x faster in this case - see below. Is it worth accepting a speed 
regression for unique to get rid of the function duplication?  (Or can they be 
combined?) 

Neil

In [24]: l = list(np.random.randint(100, size=10000))
In [25]: %timeit np.unique1d(l)
1000 loops, best of 3: 1.9 ms per loop
In [26]: %timeit np.unique(l)
1000 loops, best of 3: 793 µs per loop
In [27]: l = list(np.random.randint(100, size=1000000))
In [28]: %timeit np.unique(l)
10 loops, best of 3: 78 ms per loop
In [29]: %timeit np.unique1d(l)
10 loops, best of 3: 233 ms per loop