[Numpy-discussion] Add an axis argument to generalized ufuncs?

Sun Oct 19 09:43:02 EDT 2014

On Sun, Oct 19, 2014 at 8:25 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
> On Sat, Oct 18, 2014 at 6:46 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> One thing we'll have to watch out for is that for reduction operations
>> (which are basically gufuncs with (n)->() signatures), we already
>> allow axis=(0,1) to mean "reshape axes 0 and 1 together into one big
>> axis, and then use that as the gufunc core axis". I don't know if
>> we'll ever want to support this functionality for gufuncs in general,
>> but we shouldn't rule it out with the syntax.
>
>
> This is a great point.
>
> In fact, I think supporting this sort of functionality for gufuncs would be
> quite valuable, since there are a plenty of reduction operations that can't
> fit into the model provided by ufunc.reduce. An excellent example is
> np.median, which currently can only act on either one axis or an entire
> flattened array.
>
> If the syntax (m?,n),(n,p?)->(m?,p?) is accepted, then I think the natural
> extension to reduction operators that can act on one or more axes would be
> (n+)->() (this is regex syntax).

It's not clear we even need to alter the signature here -- the
reduction operations don't bother distinguishing between reductions
that make sense in this case (the commutative ones) and the ones that
don't (everything else), they just trust that no-one will try doing
something like np.subtract.reduce(arr, axis=(0, 1)) because it's
meaningless.

Providing some basic checks here might be useful though given that
gufunc signatures can be much more complicated than just (n)->().

> Actually, adding using an axis keyword seems like the only elegant way to
> handle disambiguating cases like this.
>
>>
>> One option would be to add a new argument axes=... for gufunc core
>> specification, and say that axis=foo is an alias for axes=[[foo]].
>
>
> Indeed, this is exactly what I was thinking. The "canonical form" for the
> axis argument would be doubly nested tuples, but if an integer or unnested
> tuple is encountered, additional nesting should be added until reaching
> canoncial form, e.g., axis=0 -> axis=(0,) -> axis=((0,),).
>
> The only particularly tricky case will be scenarios like my second one,
> axis=(0, 1) for (n)(m)->() or (n,m)->(). To deal with cases like this, the
> parsing will need to take the gufunc signature into consideration, and start
> by asking whether or not tuple is of the right size to match each function
> argument separately.

Right, the problem with (0, 1) in this system is that you can either
read it as being a single reshaping axis description and expand it to
((0, 1),), or you can read it as being two non-reshaping axis
descriptions and expand it to ((0,), (1,)).

I feel strongly that we should come up with a syntax that is
unambiguous even *without* looking at the gufunc signature. It's easy
for the computer to disambiguate stuff like this, but it'd be cruel to
ask people trying to skim through code to work out the signature and
then simulate the disambiguation algorithm in their head.

Notice in my suggestion above there are two different kwargs, "axis" and "axes".

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org