[Numpy-discussion] Boolean arrays

Robert Kern robert.kern at gmail.com
Fri Aug 27 16:17:10 EDT 2010

On Fri, Aug 27, 2010 at 15:10, Ken Watford <kwatford+scipy at gmail.com> wrote:
> On Fri, Aug 27, 2010 at 3:58 PM, Brett Olsen <brett.olsen at gmail.com> wrote:
>> Hello,
>> I have an array of non-numeric data, and I want to create a boolean
>> array denoting whether each element in this array is a "valid" value
>> or not.  This is straightforward if there's only one possible valid
>> value:
>>>>> import numpy as N
>>>>> ar = N.array(("a", "b", "c", "b", "b", "a", "d", "c", "a"))
>>>>> ar == "a"
>> array([ True, False, False, False, False,  True, False, False,  True],
>> dtype=bool)
>> If there's multiple possible valid values, I've come up with a couple
>> possible methods, but they all seem to be inefficient or kludges:
>>>>> valid = N.array(("a", "c"))
>>>>> (ar == valid[0]) | (ar == valid[1])
>> array([ True, False,  True, False, False,  True, False,  True,  True],
>> dtype=bool)
>>>>> N.array(map(lambda x: x in valid, ar))
>> array([ True, False,  True, False, False,  True, False,  True,  True],
>> dtype=bool)
>> Is there a numpy-appropriate way to do this?
>> Thanks,
>> Brett Olsen
> amap: Like Map, but for arrays.
>>>> ar = numpy.array(("a", "b", "c", "b", "b", "a", "d", "c", "a"))
>>>> valid = ('a', 'c')
>>>> numpy.amap(lambda x: x in valid, ar)
> array([ True, False,  True, False, False,  True, False,  True,  True],
> dtype=bool)

I'm not sure what version of numpy this would be in; I've never seen it.

But in any case, that would be very slow for large arrays since it
would invoke a Python function call for every value in ar. Instead,
iterate over the valid array, which is much shorter:

mask = np.zeros(ar.shape, dtype=bool)
for good in valid:
    mask |= (ar == good)

Wrap that up into a function and you're good to go. That's about as
efficient as it gets unless if the valid array gets large.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the NumPy-Discussion mailing list