[Numpy-discussion] general newbie question about applying an arbitrary function to all elements in a numpy array

Tue May 3 20:07:44 EDT 2011

On Tue, May 3, 2011 at 18:58, Michael Katz <michaeladamkatz at yahoo.com> wrote:
> I have a basic question about applying functions to array elements. There is
> a rambling introduction here and then I get to the ACTUAL QUESTION about 2/3
> of the way down.
>
> RAMBLING INTRODUCTION
>
> I was trying to do a simple mapping of array values to other values. So I
> had:
>
>                     unmapped_colors = np.array((0,0,1,1,2,0,0,1))
>                     BNA_LAND_FEATURE_CODE = 0
>                     BNA_WATER_FEATURE_CODE = 1
>                     # color_to_int() produces some unsigned int value
>                     green = color_to_int( 0.25, 0.5, 0, 0.75 )
>                     blue = color_to_int( 0.0, 0.0, 0.5, 0.75 )
>                     gray = color_to_int( 0.5, 0.5, 0.5, 0.75 )
>                     def map_type_to_color( t ):
>                         if ( t == BNA_LAND_FEATURE_CODE ):
>                             return green
>                         elif ( t == BNA_WATER_FEATURE_CODE ):
>                             return blue
>                         else:
>                             return gray
>                     mapped_colors = np.vectorize( map_type_to_color )(
> unmapped_colors )
>
> I found that by using vectorize I got about a 3x speedup compared to using a
> python for loop to loop through the array. One small question is I wonder
> where that speedup comes from; i.e., what vectorize is actually doing for
> me. Is it just the time for the actual looping logic?

Pretty much.

> That seems like too
> much speedup. Does it also include a faster access of the array data items?
>
> So I was trying to find a "pure numpy" solution for this. I then learned
> about fancy indexing and boolean indexing, and saw that I could do boolean
> array version:
>
>     mapped_colors = np.zeros(unmapped_colors.shape, dtype=np.uint32) + gray
>     mapped_colors[unmapped_colors==BNA_LAND_FEATURE_CODE] = green
>     mapped_colors[unmapped_colors==BNA_WATER_FEATURE_CODE] = blue
>
> and in fact I could do "fancy indexing" version:
>
>     color_array = np.array((green, blue, gray), dtype=np.uint32)
>     colors = color_array[unmapped_colors]
>
> The boolean array version is pretty simple, but makes three "passes" over
> the data.
>
> The fancy indexing version is simple and one pass, but it relies on the fact
> that my constant values happen to be nice to use as array indexes. And in
> fact to use it in actual code I'd need to do one or more other passes to
> check unmapped_colors for any indexes < 0 or > 2.
>
> ACTUAL QUESTION
>
> So, the actual question here is, even though I was able to find a couple
> solutions for this particular problem, I am wondering in general about
> applying arbitrary functions to each array element in a "pure numpy" way,
> ideally using just a single pass.
>
> It seems to me like there should be a some way to build up a "numpy
> compiled" function that you could then apply to all elements in an array. Is
> that what "universal functions" are about? That's what it sounded like it
> was about to me at first, but then after looking closer it didn't seem like
> that.

Universal functions, or ufuncs, are called universal because they will
accept any sequence that can be treated like an array. Roughly, if
np.array(x) and np.array(y) work, then np.add(x, y) will work
(assuming compatible dimensions, etc.). You do have to write C code in
order to make really fast ones, though. np.vectorize() makes a ufunc
from a Python function, but it still has a lot of Python overhead for
each function call.

> I guess I was hoping vectorize was some magical thing that could compile
> certain python expressions down into "numpy primitives". It seems like as
> long as you stuck to a limited set of math and logical operations on arrays,
> even if those expressions included calls to subroutines that were similarly
> restricted, there should be an automated way to convert such expressions
> into the equivalent of a numpy built-in. I guess it's similar to cython, but
> it wouldn't require a separate file or special compilation. I could just say
>
>     def capped_sqrt( n ):
>         if ( x > 100 )
>             return 10
>         else:
>             return sqrt( n )
>     f = numpy.compile( lambda(x): 0 if ( x < 10 ) else capped_sqrt( x ) )
>     numpy.map( f, a )
>
> or something like that, and it would all happen in a single pass within
> numpy, with no "python code" being run.
>
> Is that something that exists?

Look at numexpr:

  http://code.google.com/p/numexpr/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco