[Numpy-discussion] intersect1d and setmember1d

Sat Feb 28 08:56:42 EST 2009

mudit sharma <mudit_19a <at> yahoo.com> writes:

> intersect1d and setmember1d doesn't give expected results in case there are
duplicate values in either
> array becuase it works by sorting data and substracting previous value. Is
there an alternative in numpy
> to get indices of intersected values.
> 
> In [31]: p nonzero(setmember1d(v1.Id, v2.Id))[0]
> [ 0  1  2  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  
                     <-------------- index 2 shouldn't be here look at the
> data below.
>  26 27 28 29]
> 
> In [32]: p v1.Id[:10]
> [ 232.  232.  233.  233.  234.  234.  235.  235.  237.  237.]
> 
> In [33]: p v2.Id[:10]
> [ 232.  232.  234.  234.  235.  235.  236.  236.  237.  237.]
> 

As far as I know there isn't an obvious way to get the functionality of
setmember1d working on non-unique inputs. However, I've needed this operation
quite a lot, so here's a function I wrote that does it. It's only a few times
slower than numpy's setmember1d. You're welcome to use it.

import numpy as np

def ismember(a1,a2):
    """ Test whether items from a2 are in a1.

    This does the same thing as np.setmember1d, but works on
    non-unique arrays.

    Only a few (2-4) times slower than np.setmember1d, and a lot
    faster than [i in a2 for i in a1].

    An example that np.setmember1d gets wrong: 

    >>> a1 = np.array([5,4,5,3,4,4,3,4,3,5,2,1,5,5])
    >>> a2 = [2,3,4]
    >>> mask = ismember(a1,a2)
    >>> a1[mask]
    array([4, 3, 4, 4, 3, 4, 3, 2])
    """
    a2 = set(a2)
    a1 = np.asarray(a1)
    ind = a1.argsort()
    a1 = a1[ind]
    mask  = []
    # need this bit because prev is not defined for first item
    item  = a1[0]
    if item in a2:
        mask.append(True)
        a2.remove(item)
    else:
        mask.append(False)
    prev = item
    # main loop
    for item in a1[1:]:
        if item == prev:
            mask.append(mask[-1])
        elif item in a2:
            mask.append(True)
            prev = item
            a2.remove(item)
        else:
            mask.append(False)
            prev = item
    # restore mask to original ordering of a1 and return
    mask = np.array(mask)
    return mask[ind.argsort()]