[Numpy-discussion] extract elements of an array that are contained in another array?

Thu Jun 4 13:27:25 EDT 2009

On Thu, Jun 4, 2009 at 12:32 PM, Alan G Isaac <aisaac at american.edu> wrote:
> On 6/4/2009 11:29 AM josef.pktd at gmail.com apparently wrote:
>> intersect1d  is the intersection between sets (which are stored as
>> arrays), just like in the mathematical definition the two sets only
>> have unique elements
>
> Hmmm. OK, I see you and Robert believe this.
> But it does not match the documentation.
> But indeed, I see that the documentation is incorrect.
> E.g.,
>
>>>> np.intersect1d([1,1,2,3,3,4],[1,4])
> array([1, 1, 3, 4])
>
> Is this a bug or a documentation bug?
>
>
>
>> intersect1d_nu is the intersection between two arrays which can have
>> repeated elements. The result is a set, i.e. unique elements, stored
>> as an array
>
>> same for setmember1d, setmember1d_nu
>
> I cannot understand this.
> Following your proposed reasoning,
> I expect a[setmember1d_nu(a,b)]
> to return the same as
> intersect1d_nu(a, b).
> It does not.

I don't have setmember1d_nu available right now, but from my reading
we should have

 intersect1d_nu(a, b).== np.unique(a[setmember1d_nu(a,b)])

>
>
>
>> so  postfix `_nu` only means that this function also works
>> if the two arrays are not really sets
>
> But that just begs the question: what does 'works' mean?
> See my previous comment (above).
>
>
>
>> intersect1d should throw a domain error if you give it arrays with
>> non-unique elements, which is not done for speed reasons
>
> *If* intersect1d behaved *exactly* as documented,
> the example
> intersect1d(a, np.unique(b))
> shows that the documented behavior can be useful.
> And indeed, this would be the match to
> a[setmember1d_nu(a,b)]

I'm don't know if anyone looked at the behavior for "unintented" usage

intersect1d  rearranges, sorts
>>> np.intersect1d([4,1,3,3],[3,4])
array([3, 3, 4])

but it gives you the correct multiplicity
>>> np.intersect1d([4,4,4,1,3,3],np.unique([3,4,3,0]))
array([3, 3, 4, 4, 4])

so I guess, we have
np.intersect1d([4,4,4,1,3,3], np.unique([3,4,3,0])) ==
np.sort(a[setmember1d_nu(a,b)])

for the example from the help file I don't find any meaningful interpretation
>>> np.intersect1d([1,3,3],[3,1,1])
array([1, 1, 3, 3])

wrong answer
>>> np.setmember1d([4,1,1,3,3],[3,4])
array([ True,  True, False,  True,  True], dtype=bool)

Note: there are two versions of the docs for np.intersect1d, the
currently published docs which describe the actual behavior (for the
non-unique case), and the new docs on the doc editor
http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
that describe the "intended" usage of the functions, which also
corresponds closer to the original source docstring
(http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
). that's my interpretation

If you think that functions make sense also for the "unintended"
usage, then you could add an example to the new docs.

Josef