[Numpy-discussion] Proposal for new ufunc functionality

Mon Apr 12 18:35:48 EDT 2010

On 12 April 2010 18:26, Travis Oliphant <oliphant at enthought.com> wrote:

> We should collect all of these proposals into a NEP.

Or several NEPs, since I think they are quasi-orthogonal.

> To clarify what I
> mean by "group-by" behavior.
> Suppose I have an array of floats and an array of integers.   Each element
> in the array of integers represents a region in the float array of a certain
> "kind".   The reduction should take place over like-kind values:
> Example:
> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
> results in the calculations:
> 1 + 3 + 6 + 7
> 2 + 4
> 5 + 8 + 9
> and therefore the output (notice the two arrays --- perhaps a structured
> array should be returned instead...)
> [0,1,2],
> [17, 6, 22]
>
> The real value is when you have tabular data and you want to do reductions
> in one field based on values in another field.   This happens all the time
> in relational algebra and would be a relatively straightforward thing to
> support in ufuncs.

As an example, if I understand correctly, this would allow the
"histogram" functions to be replaced by a one-liner, e.g.:

add.reduceby(array=1, by=((A-min)*n/(max-min)).astype(int))

It would also be valuable to support output arguments of some sort, so
that, for example, reduceby could be used to accumulate values into an
output array at supplied indices. (I leave the value of using this on
matrix multiplication or arctan2 to the reader's imagination.)

Anne