[Numpy-discussion] nanargmax failure case (was: Re: [SciPy-Dev] 1.8.0rc1)

Tue Oct 1 12:33:58 EDT 2013

On Tue, Oct 1, 2013 at 10:19 AM, <josef.pktd at gmail.com> wrote:

> On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs at pobox.com> wrote:
> > On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >>
> >>
> >> On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs at pobox.com> wrote:
> >>>
> >>> [switching subject to break out from the giant 1.8.0rc1 thread]
> >>>
> >>> On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris
> >>> <charlesr.harris at gmail.com> wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs at pobox.com>
> wrote:
> >>> >>
> >>> >> On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris
> >>> >> <charlesr.harris at gmail.com> wrote:
> >>> >> > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs at pobox.com>
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <
> cgohlke at uci.edu>
> >>> >> >> wrote:
> >>> >> >> > 2) Bottleneck 0.7.0
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701
> >>> >> >>
> >>> >> >> I can't tell if these are real bugs in numpy, or tests checking
> that
> >>> >> >> bottleneck is bug-for-bug compatible with old numpy and we just
> >>> >> >> fixed
> >>> >> >> some bugs, or what. It's clearly something to do with the
> >>> >> >> nanarg{max,min} rewrite -- @charris, do you know what's going on
> >>> >> >> here?
> >>> >> >>
> >>> >> >
> >>> >> > Yes ;) The previous behaviour of nanarg for all-nan axis was to
> cast
> >>> >> > nan
> >>> >> > to
> >>> >> > intp when the result was an array, and return nan when a scalar.
> The
> >>> >> > current
> >>> >> > behaviour is to return the most negative value of intp as an error
> >>> >> > marker in
> >>> >> > both cases and raise a warning. It is a change in behavior, but I
> >>> >> > think
> >>> >> > one
> >>> >> > that needs to be made.
> >>> >>
> >>> >> Ah, okay! I kind of lost track of the nanfunc changes by the end
> there.
> >>> >>
> >>> >> So for the bottleneck issue, it sounds like the problem is just that
> >>> >> bottleneck is still emulating the old numpy behaviour in this corner
> >>> >> case, which isn't really a problem. So we don't really need to worry
> >>> >> about that, both behaviours are correct, just maybe out of sync.
> >>> >>
> >>> >> I'm a little dubious about this "make up some weird value that will
> >>> >> *probably* blow up if people try to use it without checking, and
> also
> >>> >> raise a warning" thing, wouldn't it make more sense to just raise an
> >>> >> error? That's what exceptions are for? I guess I should have said
> >>> >> something earlier though...
> >>> >>
> >>> >
> >>> > I figure the blowup is safe, as we can't allocate arrays big enough
> that
> >>> > the
> >>> > minimum intp value would be a valid index. I considered raising an
> >>> > error,
> >>> > and if there is a consensus the behavior could be changed. Or we
> could
> >>> > add a
> >>> > keyword to determine the behavior.
> >>>
> >>> Yeah, the intp value can't be a valid index, so that covers 95% of
> >>> cases, but I'm worried about that other 5%. It could still pass
> >>> silently as the endpoint of a slice, or participate in some sort of
> >>> integer arithmetic calculation, etc. I assume you also share this
> >>> worry to some extent or you wouldn't have put in the warning ;-).
> >>>
> >>> I guess the bigger question is, why would we *not* use the standard
> >>> method for signaling an exceptional condition here, i.e., exceptions?
> >>> That way we're 100% guaranteed that if people aren't prepared to
> >>> handle it then they'll at least know something has gone wrong, and if
> >>> they are prepared to handle it then it's very easy and standard, just
> >>> use try/except. Right now I guess you have to check for the special
> >>> value, but also do something to silence warnings, but just for that
> >>> one line? Sounds kind of complicated...
> >>
> >>
> >> The main reason was for the case of multiple axis, where some of the
> results
> >> would be valid and others not. The simple thing might be to raise an
> >> exception but keep the current return values so that users could
> determine
> >> where the problem occurred.
> >
> > Oh, duh, yes, right, now I remember this discussion. Sorry for being
> slow.
> >
> > In the past we've *always* raised in error in the multiple axis case,
> > right? Has anyone ever complained? Wanting to get all
> > nanargmax/nanargmin results, of which some might be errors, without
> > just writing a loop, seems like a pretty exotic case to me, so I'm not
> > sure we should optimize for it at the expense of returning
> > possibly-misleading results in the scalar case.
> >
> > Like (I think) you say, we could get the best of both worlds by
> > encoding the results in the same way we do right now, but then raise
> > an exception and attach the results to the exception so they can be
> > retrieved if wanted. Kind of cumbersome, but maybe good?
> >
> > This is a more general problem though of course -- we've run into it
> > in the gufunc linalg code too, where there's some question about you
> > do in e.g. chol() if some sub-matrices are positive-definite and some
> > are not.
> >
> > Off the top of my head the general solution might be to define a
> > MultiError exception type that has a standard generic format for
> > describing such things. It'd need a mask saying which values were
> > valid, rather than encoding them directly into the return values --
> > otherwise we have the problem where nanargmax wants to use INT_MIN,
> > chol wants to use NaN, and maybe the next function along doesn't have
> > any usable flag value available at all. So probably more thought is
> > needed before nailing down exactly how we handle such "partial" errors
> > for vectorized functions.
> >
> > In the short term (i.e., 1.8.0), maybe we should defer this discussion
> > by simply raising a regular ValueError for nanarg functions on all
> > errors? That's not a regression from 1.7, since 1.7 also didn't
> > provide any way to get at partial results in the event of an error,
> > and it leaves us in a good position to solve the more general problem
> > later.
>
> Can we make the error optional in these cases?
>
> like np.seterr for zerodivision, invalid, or floating point errors
> that allows ignore and raise
> np.seterr(linalg='ignore')
>
> I don't know about nanarg, but thinking about some applications for
> gufunc linalg code.
>
> In some cases I might require for example invertibility of all
> matrices and raise if one fails,
> in other case I would be happy with nans, and just sum the results
> with nansum for example or replace them by some fill value.
>
> I'm thinking warnings might be more flexible than exceptions:

with warnings.catch_warnings():
    warnings.simplefilter('error')
    ...

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131001/ae7e7a58/attachment.html>