[Numpy-discussion] Boolean arrays

Robert Kern robert.kern at gmail.com
Fri Aug 27 16:17:10 EDT 2010


On Fri, Aug 27, 2010 at 15:10, Ken Watford <kwatford+scipy at gmail.com> wrote:
> On Fri, Aug 27, 2010 at 3:58 PM, Brett Olsen <brett.olsen at gmail.com> wrote:
>> Hello,
>>
>> I have an array of non-numeric data, and I want to create a boolean
>> array denoting whether each element in this array is a "valid" value
>> or not.  This is straightforward if there's only one possible valid
>> value:
>>>>> import numpy as N
>>>>> ar = N.array(("a", "b", "c", "b", "b", "a", "d", "c", "a"))
>>>>> ar == "a"
>> array([ True, False, False, False, False,  True, False, False,  True],
>> dtype=bool)
>>
>> If there's multiple possible valid values, I've come up with a couple
>> possible methods, but they all seem to be inefficient or kludges:
>>>>> valid = N.array(("a", "c"))
>>>>> (ar == valid[0]) | (ar == valid[1])
>> array([ True, False,  True, False, False,  True, False,  True,  True],
>> dtype=bool)
>>>>> N.array(map(lambda x: x in valid, ar))
>> array([ True, False,  True, False, False,  True, False,  True,  True],
>> dtype=bool)
>>
>> Is there a numpy-appropriate way to do this?
>>
>> Thanks,
>> Brett Olsen
>
> amap: Like Map, but for arrays.
>
>>>> ar = numpy.array(("a", "b", "c", "b", "b", "a", "d", "c", "a"))
>>>> valid = ('a', 'c')
>>>> numpy.amap(lambda x: x in valid, ar)
> array([ True, False,  True, False, False,  True, False,  True,  True],
> dtype=bool)

I'm not sure what version of numpy this would be in; I've never seen it.

But in any case, that would be very slow for large arrays since it
would invoke a Python function call for every value in ar. Instead,
iterate over the valid array, which is much shorter:

mask = np.zeros(ar.shape, dtype=bool)
for good in valid:
    mask |= (ar == good)

Wrap that up into a function and you're good to go. That's about as
efficient as it gets unless if the valid array gets large.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list