[SciPy-User] Most efficient way to eliminate or keep specified features from labeled image

Wed Apr 6 17:52:09 EDT 2011

On Apr 6, 2011, at 5:40 PM, Zachary Pincus wrote:

>> I have a 2D image labeled with scipy.ndimage.label and a list of
>> integers that identify features I would like to eliminate (set to
>> 0), or keep (set the others to 0). I can think of several ways to do
>> this (nested loops, use generic_filter, etc.). Some techniques might
>> provide better performance than others, depending on the size of the
>> image, the number of features to eliminate/keep, etc. Is there a
>> technique that provides good performance across a wide range of
>> scenarios?
>>
>> This seems like one of those problems where the answer should be
>> obvious—there should be some function specifically for it—but I have
>> not come across it yet.
>
> What I usually do in this situation is to make and then or (or and, as
> required) together multiple boolean masks. Say I want to pick out the
> regions of arr that are either 1- or 2-valued:
>
> mask = (arr == 1) | (arr == 2)
> arr *= mask
>
> This obviously scales badly when there are many values one wants to
> pick out. I don't know of any numpy function that compiles a boolean
> mask with "values-in" sort of logic, (e.g. mask = values_in(arr,
> [1,2]) ) but there might be some way to achieve this functionality.
> Alternately, that would be very simple to implement in cython as an  
> ad-
> hoc solution. (And it could be extremely fast if the dtype of the
> input array were limited to 16-bit or something so a lookup-table
> approach to doing the "in" test would suffice, instead of having to
> iterate through the list at each array element.)

A second's thought suggests that numpy.choose is perfect here!

a = numpy.arange(16).reshape((4,4))

In [71]: a

Out[71]:
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])

In [72]: a.choose([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1])

Out[72]:
array([[0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1],
        [0, 0, 0, 1]])

Basically, as input to choose, you pass an array that's as long as the  
maximum label, and has ones at the indices of the labels you want to  
keep, and zeros elsewhwere. Given a list of the labels you want, it's  
easy to make such an array with fancy indexing:

In [76]: z = numpy.zeros(16, dtype=int)

In [77]: z[[4,6,2]] = 1

In [78]: z

Out[78]: array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [79]: a.choose(z)

Out[79]:
array([[0, 0, 1, 0],
        [1, 0, 1, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]])

so to keep labels with values from a list:
keep_labels = [1,2,5]
choices = numpy.zeros(array.max(), dtype=int)
choices[keep_labels] = 1
array *= array.choose(choices)

There might be even more compact ways of generating the choices array...

Zach