[Numpy-discussion] extract elements of an array that are contained in another array?

Robert Cimrman cimrman3 at ntc.zcu.cz
Mon Jun 8 07:51:26 EDT 2009


Hi Josef,

thanks for the summary! I am responding below, later I will make an 
enhancement ticket.

josef.pktd at gmail.com wrote:
> On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton <neilcrighton at gmail.com> wrote:
>> Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:
>>
>>> Anne Archibald wrote:
>>>
>>>> 1. add a keyword argument to intersect1d "assume_unique"; if it is not
>>>> present, check for uniqueness and emit a warning if not unique
>>>> 2. change the warning to an exception
>>>> Optionally:
>>>> 3. change the meaning of the function to that of intersect1d_nu if the
>>>> keyword argument is not present
>>>>
> 
> 1. merge _nu version into one function
> -------------------------------------------------------
> 
>>> You mean something like:
>>>
>>> def intersect1d(ar1, ar2, assume_unique=False):
>>>      if not assume_unique:
>>>          return intersect1d_nu(ar1, ar2)
>>>      else:
>>>          ... # the current code
>>>
>>> intersect1d_nu could be still exported to numpy namespace, or not.
>>>
>> +1 - from the user's point of view there should just be intersect1d and
>> setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests
>> can be used if speed is a problem.
> 
> + 1 on rolling the _nu versions this way into the plain version, this
> would avoid a lot of the confusion.
> It would not be a code breaking API change for existing correct usage
> (but some speed regression without adding keyword)

+1

> depreciate intersect1d_nu
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> intersect1d_nu could be still exported to numpy namespace, or not.
> I would say not, if they are the default branch of the non _nu version
> 
> +1 on depreciation

+0

> 2. alias as "in"
> ---------------------
>> I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is
>> another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from
>> readability, unlike the extra a in arange.
> I don't like the extra "a"s either, ones name spaces are commonly used
> 
> alias setmember1d_nu as `in1d` or `isin1d`, because the function is a
> "in" and not a set operation
> +1

+1

> 3. behavior of other set functions
> -----------------------------------------------
> 
> guarantee that setdiff1d works for non-unique arrays (even when
> implementation changes), and change documentation
> +1

+1, it is useful for non-unique arrays.

> need to check other functions
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> union1d:  works for non-unique arrays, obvious from source

Yes.

> setxor1d: requires unique arrays
>>>> np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6])
> array([2, 4, 5, 6])
>>>> np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6]))
> array([0, 3, 4, 5, 6])
> 
> setxor: add keyword option and call unique by default
> +1 for symmetry

+1 - you mean np.setxor1d(np.unique(a), np.unique(b)) to become 
np.setxor1d(a, b, assume_unique=False), right?

> ediff1d and unique1d are defined for non-unique arrays

yes

> 4. name of keyword
> ----------------------------
> 
> intersect1d(ar1, ar2, assume_unique=False)
> 
> alternative isunique=False  or just unique=False
> +1 less to write

We should look at other functions in numpy (and/or scipy), what is a 
common scheme here. -1e-1 to the proposed names, as isunique is singular 
only, and unique=False does not show clearly the intent for me. What 
about ar1_unique=False, ar2_unique=False - to address each argument 
specifically?

> 5. module name
> -----------------------
> 
> rename arraysetops to something easier to read like setfun. I think it
> would only affect internal changes since all functions are exported to
> the main numpy name space
> +1e-4  (I got used to arrayse_tops)

+0 (internal change only). Other numpy/scipy submodules containing a 
bunch of functions are called *pack (fftpack, arpack, lapack), *alg 
(linalg), *utils. *fun is used comonly in the matlab world.

> 5. keep docs in sync with correct usage
> ---------------------------------------------------------
> 
> obvious

+1

thanks,
r.




More information about the NumPy-Discussion mailing list