[Numpy-discussion] What is up with raw boolean indices (like a[False])?

Thu Aug 20 18:00:46 EDT 2020

Just to be clear, what exactly do you think should be deprecated?
Boolean scalar indices in general, or just boolean scalars combined
with other arrays, or something else?

Aaron Meurer

On Thu, Aug 20, 2020 at 3:56 PM Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>
> On Thu, 2020-08-20 at 16:50 -0500, Sebastian Berg wrote:
> > On Thu, 2020-08-20 at 12:21 -0600, Aaron Meurer wrote:
> > > You're right. I was confusing the broadcasting logic for boolean
> > > arrays.
> > >
> > > However, I did find this example
> > >
> > > > > > np.arange(10).reshape((2, 5))[np.array([[0, 0, 0, 0, 0]],
> > > > > > dtype=np.int64), False]
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > IndexError: shape mismatch: indexing arrays could not be broadcast
> > > together with shapes (1,5) (0,)
> > >
> > > That certainly seems to imply there is some broadcasting being
> > > done.
> >
> > Yes, it broadcasts the array after converting it with `nonzero`, i.e.
> > its much the same as:
> >
> >    indices = [[0, 0, 0, 0, 0]], *np.nonzero(False)
> >    indices = np.broadcast_arrays(*indices)
> >
> > will give the same result (see also `np.ix_` which converts booleans
> > as
> > well for this reason, to give you outer indexing).
> > I was half way through a mock-up/pseudo code, but thought you likely
> > wasn't sure it was ending up clear. It sounds like things are
> > probably
> > falling into place for you (if they are not, let me know what might
> > help you):
>
> Sorry editing error up there, in short I hope those steps sense to you,
> note that the broadcasting is basically part of a later "integer only"
> indexing step, and the `nonzero` part is pre-processing.
>
> >
> > 1. Convert all boolean indices into a series of integer indices using
> >    `np.nonzero(index)`
> >
> > 2. For True/False scalars, that doesn't work, because `np.nonzero()`.
> >
> >  `nonzero` gave us an index array (which is good, we obviously want
> >
> > one), but we need to index into `boolean_index.ndim == 0`
> >    dimensions!
> >    So that won't work, the approach using `nonzero` cannot generalize
> >
> >  here, although boolean indices generalize perfectly.
> >
> >    The solution to the dilemma is simple: If we have to index one
> >    dimension, but should be indexing zero, then we simply add that
> >    dimension to the original array (or at least pretend there was
> >    an additional dimension).
> >
> > 3. Do normal indexing with the result *including broadcasting*,
> >    we forget it was converted.
> >
> > The other way to solve it would be to always reshape the original
> > array
> > to combine all axes being indexed by a single boolean index into one
> > axis and then index it using `np.flatnonzero`.  (But that would get a
> > different result if you try to broadcast!)
> >
> >
> > In any case, I am not sure I would bother with making sense of this,
> > except for sports!
> > Its pretty much nonsense and I think the time understanding it is
> > probably better spend deprecating it.  The only reason I did not
> > Deprecate itt before, is that I tried to do be minimal in the changes
> > when I rewrote advanced indexing (and generalized boolean scalars
> > correctly) long ago.  That was likely the right start/choice at the
> > time, since there were much bigger fish to catch, but I do not think
> > anything is holding us back now.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > > Aaron Meurer
> > >
> > > On Wed, Aug 19, 2020 at 6:55 PM Sebastian Berg
> > > <sebastian at sipsolutions.net> wrote:
> > > > On Wed, 2020-08-19 at 18:07 -0600, Aaron Meurer wrote:
> > > > > > > 3. If you have multiple advanced indexing you get annoying
> > > > > > > broadcasting
> > > > > > >    of all of these. That is *always* confusing for boolean
> > > > > > > indices.
> > > > > > >    0-D should not be too special there...
> > > > >
> > > > > OK, now that I am learning more about advanced indexing, this
> > > > > statement is confusing to me. It seems that scalar boolean
> > > > > indices do
> > > > > not broadcast. For example:
> > > >
> > > > Well, broadcasting means you broadcast the *nonzero result*
> > > > unless
> > > > I am
> > > > very confused... There is a reason I dismissed it. We could (and
> > > > arguably should) just deprecate it.  And I have doubts anyone
> > > > would
> > > > even notice.
> > > >
> > > > > > > > np.arange(2)[False, np.array([True, False])]
> > > > > array([], dtype=int64)
> > > > > > > > np.arange(2)[tuple(np.broadcast_arrays(False,
> > > > > > > > np.array([True,
> > > > > > > > False])))]
> > > > > Traceback (most recent call last):
> > > > >   File "<stdin>", line 1, in <module>
> > > > > IndexError: too many indices for array: array is 1-dimensional,
> > > > > but 2
> > > > > were indexed
> > > > >
> > > > > And indeed, the docs even say, as you noted, "the nonzero
> > > > > equivalence
> > > > > for Boolean arrays does not hold for zero dimensional boolean
> > > > > arrays,"
> > > > > which I guess also applies to the broadcasting.
> > > >
> > > > I actually think that probably also holds. Nonzero just behave
> > > > weird
> > > > for 0D because arrays (because it returns a tuple).
> > > > But since broadcasting the nonzero result is so weird, and since
> > > > 0-
> > > > D
> > > > booleans require some additional logic and don't generalize 100%
> > > > (code
> > > > wise), I won't rule out there are differences.
> > > >
> > > > > From what I can tell, the logic is that all integer and boolean
> > > > > arrays
> > > >
> > > > Did you try that? Because as I said above, IIRC broadcasting the
> > > > boolean array without first calling `nonzero` isn't really whats
> > > > going
> > > > on. And I don't know how it could be whats going on, since adding
> > > > dimensions to a boolean index would have much more implications?
> > > >
> > > > - Sebastian
> > > >
> > > >
> > > > > (and scalar ints) are broadcast together, *except* for boolean
> > > > > scalars. Then the first boolean scalar is replaced with and(all
> > > > > boolean scalars) and the rest are removed from the index. Then
> > > > > that
> > > > > index adds a length 1 axis if it is True and 0 if it is False.
> > > > >
> > > > > So they don't broadcast, but rather "fake broadcast". I still
> > > > > contend
> > > > > that it would be much more useful, if True were a synonym for
> > > > > newaxis
> > > > > and False worked like newaxis but instead added a length 0
> > > > > axis.
> > > > > Alternately, True and False scalars should behave exactly like
> > > > > all
> > > > > other boolean arrays with no exceptions (i.e., work like
> > > > > np.nonzero(),
> > > > > broadcast, etc.). This would be less useful, but more
> > > > > consistent.
> > > > >
> > > > > Aaron Meurer
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion