[Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

Robert Kern robert.kern at gmail.com
Wed Jun 27 01:21:49 EDT 2018


On Tue, Jun 26, 2018 at 9:50 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> I think having more self-contained descriptions of the semantics of each
>> of these would be a good idea. The current description of `.vindex` spends
>> more time talking about what it doesn't do, compared to the other methods,
>> than what it does.
>>
>
> Will do.
>
>
>> I'm still leaning towards not warning on current, unproblematic common
>> uses. It's unnecessary churn for currently working, understandable code. I
>> would still reserve warnings and deprecation for the cases where the
>> current behavior gives us something that no one wants. Those are the real
>> traps that people need to be warned away from.
>>
>> If someone is mixing slices and integer indices, that's a really good
>> sign that they thought indexing behaved in a different way (e.g. orthogonal
>> indexing).
>>
>
> I agree, but I'm still not  entirely sure where to draw the line on
> behavior that should issue a warning. Some options, in roughly descending
> order of severity:
> 1. Warn if [] would give a different result than .oindex[]. This is the
> current proposal in the NEP, but based on the feedback we should hold back
> on it for now.
> 2. Warn if there is a mixture of arrays/slice objects in indices for [],
> even implicitly (e.g., including arr[idx] when is equivalent to arr[idx,
> :]). In this case, indices end up at the end both for legacy_index and
> vindex, but arguably that is only a happy coincidence.
>

I'd have to deep dive through my email archive to double check, but I'm
pretty sure this is intentional design, not coincidence. There is a
long-standing pattern of using the first axes as the "collection" axes when
the objects that we are concerned with are vectors or matrices or more. For
example, evaluate a scalar field on a grid in 3D space (nx, ny, nz), then
the gradient at those points is usually represented as (nx, ny, nz, 3). It
is desirable to be able to apply the same indices to the scalar grid and
the vector grid to select out the scalar and vector values at the same set
of points. It's why we implicitly tack on empty slices to the end of any
partial index tuple (e.g. with just integer scalars).

The current rules for mixing slices and integer array indices are possibly
the simplest way to effect this use case; it is the behaviors for the other
cases that are the unhappy coincidences.

3. Warn if [] would give a different result from .vindex[]. This is a
> little weaker than the previous condition, because arr[idx, :] or arr[idx,
> ...] would not give a warning. However, cases like arr[..., idx] or arr[:,
> idx, :] would still start to give warnings, even though they are arguably
> well defined according to either outer indexing (if idx.ndim == 1) or
> legacy indexing (due to dimension reordering rules that will be omitted
> from vindex).
> 4. Warn if there are multiple arrays/integer indices separated by a slice
> object, e.g., arr[idx1, :, idx2]. This is the edge case that really trips
> up users.
>
> As I said in my other response, in the long term, I would prefer to either
> (a) drop support for vectorized indexing in [] or (b) if we stick with
> supporting vectorized indexing in [], at least ensure consistent dimension
> ordering rules for [] and vindex[]. That would suggest using either my
> proposed rule 2 or 3.
>
> I also agree with you that anyone mixing slices and integers probably is
> confused about how indexing works, at least in edge cases. But given the
> lengths that legacy indexing goes to to support "outer indexing-like"
> behavior in the common case of a single integer array and many slices, I am
> hesitant to start warning in this case. The result of arr[..., idx, :] is
> relatively easy to understand, even though it uses its own set of rules,
> which happen to be more consistent with oindex[] than vindex[].
>
> We certainly could make the conservative choice of only adopting 4 for now
> and leaving further cleanup for later. I guess this uncertainty about
> whether direct indexing should be more like vindex[] or oindex[] in the
> long term is a good argument for holding off on other warnings for now. But
> I think we are almost certainly going to want to make further
> warnings/deprecations of some form.
>

I'd prefer 4, could be talked into 3, but any higher is not a good idea, I
don't think.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/9d63bda6/attachment.html>


More information about the NumPy-Discussion mailing list