[Numpy-discussion] extract elements of an array that are contained in another array?

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Jun 6 07:41:32 EDT 2009


On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton <neilcrighton at gmail.com> wrote:
> Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:
>
>> Anne Archibald wrote:
>>
>> > 1. add a keyword argument to intersect1d "assume_unique"; if it is not
>> > present, check for uniqueness and emit a warning if not unique
>> > 2. change the warning to an exception
>> > Optionally:
>> > 3. change the meaning of the function to that of intersect1d_nu if the
>> > keyword argument is not present
>> >

1. merge _nu version into one function
-------------------------------------------------------

>> You mean something like:
>>
>> def intersect1d(ar1, ar2, assume_unique=False):
>>      if not assume_unique:
>>          return intersect1d_nu(ar1, ar2)
>>      else:
>>          ... # the current code
>>
>> intersect1d_nu could be still exported to numpy namespace, or not.
>>
>
> +1 - from the user's point of view there should just be intersect1d and
> setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests
> can be used if speed is a problem.

+ 1 on rolling the _nu versions this way into the plain version, this
would avoid a lot of the confusion.
It would not be a code breaking API change for existing correct usage
(but some speed regression without adding keyword)

depreciate intersect1d_nu
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> intersect1d_nu could be still exported to numpy namespace, or not.
I would say not, if they are the default branch of the non _nu version

+1 on depreciation


2. alias as "in"
---------------------
>
> I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is
> another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from
> readability, unlike the extra a in arange.
I don't like the extra "a"s either, ones name spaces are commonly used

alias setmember1d_nu as `in1d` or `isin1d`, because the function is a
"in" and not a set operation
+1

>
> Can we summarise the discussion in this thread and write up a short proposal
> about what we'd like to change in arraysetops, and how to make the changes?
> Then it's easy for other people to give their opinion on any changes. I can do
> this if no one else has time.
>

 other points

3. behavior of other set functions
-----------------------------------------------

guarantee that setdiff1d works for non-unique arrays (even when
implementation changes), and change documentation
+1

need to check other functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
union1d:  works for non-unique arrays, obvious from source

setxor1d: requires unique arrays
>>> np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6])
array([2, 4, 5, 6])
>>> np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6]))
array([0, 3, 4, 5, 6])

setxor: add keyword option and call unique by default
+1 for symmetry

ediff1d and unique1d are defined for non-unique arrays


4. name of keyword
----------------------------

intersect1d(ar1, ar2, assume_unique=False)

alternative isunique=False  or just unique=False
+1 less to write


5. module name
-----------------------

rename arraysetops to something easier to read like setfun. I think it
would only affect internal changes since all functions are exported to
the main numpy name space
+1e-4  (I got used to arrayse_tops)


5. keep docs in sync with correct usage
---------------------------------------------------------

obvious


That's my summary and opinions

Josef

>
> Neil
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list