[Numpy-discussion] Proposal for new ufunc functionality
Travis Oliphant
oliphant at enthought.com
Tue Apr 13 10:03:51 EDT 2010
On Apr 12, 2010, at 5:31 PM, Robert Kern wrote:
>>
>> We should collect all of these proposals into a NEP. To
>> clarify what I
>> mean by "group-by" behavior.
>> Suppose I have an array of floats and an array of integers. Each
>> element
>> in the array of integers represents a region in the float array of
>> a certain
>> "kind". The reduction should take place over like-kind values:
>> Example:
>> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
>> results in the calculations:
>> 1 + 3 + 6 + 7
>> 2 + 4
>> 5 + 8 + 9
>> and therefore the output (notice the two arrays --- perhaps a
>> structured
>> array should be returned instead...)
>> [0,1,2],
>> [17, 6, 22]
>>
>> The real value is when you have tabular data and you want to do
>> reductions
>> in one field based on values in another field. This happens all
>> the time
>> in relational algebra and would be a relatively straightforward
>> thing to
>> support in ufuncs.
>
> I might suggest a simplification where the by array must be an array
> of non-negative ints such that they are indices into the output. For
> example (note that I replace 2 with 3 and have no 2s in the by array):
>
> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
> [17, 6, 0, 22]
>
> This basically generalizes bincount() to other binary ufuncs.
>
Interesting proposal. I do like the having only one output.
I'm particularly interested in reductions with "by" arrays of
strings. i.e. something like:
add.reduceby([10,11,12,13,14,15,16],
by=['red','green','red','green','red','blue', 'blue']).
resulting in:
10+12+14
11+13
15+16
In practice, these would have to be essentially mapped to the kind of
integer array I used in the original example, and so I suppose if we
couple your proposal with the segment function from the rest of my
original proposal, then the same resulting functionality is available
(with perhaps the extra intermediate integer array that may not be
strictly necessary).
But, having simple building blocks is usually better in the long run
(and typically leads to better optimizations by human programmers).
Thanks,
-Travis
--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliphant at enthought.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100413/af131e1c/attachment.html>
More information about the NumPy-Discussion
mailing list