[Numpy-discussion] ufunc and nditer flags (was Re: Code Freeze for NumPy 1.7)

Sun Jul 15 16:29:01 EDT 2012

On Jul 15, 2012, at 2:23 PM, Nathaniel Smith wrote:

> On Sun, Jul 15, 2012 at 6:18 PM, jay bourque <jay.bourque at continuum.io> wrote:
>> Just added PR #359. The purpose is to allow the nditer object operand and
>> iter flags to be set for a ufunc to provide better control over how an array
>> is iterated over by a ufunc and how the ufunc uses the operands passed to
>> it. One specific motivation for this is to be able to specify an input
>> operand to a ufunc as being read/write instead of read only.
> 
> Huh. My first gut reaction to this is that it's an argument *against*
> merging this change, because ufuncs *shouldn't* be writing to their
> inputs. Maybe I'm wrong, but... obviously there is more context here
> than we've heard so far. Can you explain what you're actually trying
> to accomplish?
> 

This is a generalization that allows ufuncs to be more flexible.  It's particularly important as the changes to the ufunc implementation in 1.6 where a lot more buffering is taking place has changed the implicit behavior that some users were relying on. 

In particular, there are several NumPy users who have assumed that they could treat "read-only" inputs as "read-write" and modify the inputs in the ufunc for a variety of reasons (to hold state, to implement interesting functions that depend on the order in which it's called, etc.).       With the changes in 1.6 to the way ufuncs are buffered their code broke as buffered inputs were not copied back to the underlying arrays after the ufunc was called.  

It would be great if such people would use this list to communicate their concerns in more detail, but some are not able to.   That doesn't mean their concerns are not valid and should not be considered.    We can argue that people "should not" be using ufuncs in that way, or we could look at whether or not it makes sense to have input-and-output arguments for ufuncs.    It's helpful to remember that ufuncs can be more general than the simple unary and binary ones that most are used to.   

Fortran has "inout" arguments for it's subroutines which is an argument for the general utility of such a device in programming.    If we want ufunc kernels to grow beyond element-wise, or be used with structured arrays, etc., then allowing a ufunc to be created that defines arguments as inout seems reasonable.   We already "sort-of" have the ability to define "inout" arguments in that one can pass an output array into a ufunc and it can be pre-filled with whatever one wants (and one can use the data in the "out" array as if it were input).   But, this is also a hack, I think.     I think it's better to just allow the user to specify their intent so that the nditer buffering mechanism to do the right thing with arrays that are inputs and arrays that are outputs and arrays that are specified as *both* input and output.    

My view is that intelligent programmers have found a use-case for treating ufunc arguments as inout.   This is a general paradigm that exists in other lanaguages for scientific computing.    We already have the ability to specify an "out" parameter which can be abused for this sort of thing as well, but I'd rather let people be explicit about it so that we can reason correctly in the future about what people are trying to do.   This will especially be useful as more "generalized ufunc kernels" get written. 

Thus, I think it makes a lot of sense to allow people to be explicit about the intent of arguments as inout instead of trying to find loop-holes in the current implementation to get what they want.  

Thanks,

-Travis