[Python-ideas] Where did we go wrong with negative stride?

Mon Oct 28 04:05:11 CET 2013

]Steven D'Aprano <steve at pearwood.info>]
> ...
> I really like that view point, but it has a major problem. As
> beautifully elegant as the "cut between positions" model is for
> stride=1, it doesn't extend to non-unit strides. You cannot think about
> non-contiguous slices in terms of a pair of cuts at position <start> and
> <end>. I believe that the cleanest way to understand non-contiguous
> slices with stride > 1 is to think of array indices. That same model
> works for the negative stride case too.

"Cut between positions" in my view has nothing to do with the stride.
It's used to determine the portion of the sequence _to which_ the
stride applies.  From that portion, we take the first element, the
first + |stride|'th element, the first + |stride|*2'th element, and so
on.  Finally, if stride < 0, we reverse that sequence.  That's the
proposal.  It's simple and uniform.

> Further details below.

Not really needed - I already know exactly how slicing works in Python today ;-)

> ...
> But that's just a longer way of writing this:
>
>     s[1:8:2] => s[1] + s[3] + s[5] + s[7]
>     => 'bdfh'
>
> which I maintain is a cleaner way to think about non-unit step-sizes.

As above, so do I.  start:stop just delineates the subsequence to
which the stride applies.

> It's certainly *shorter* to think of indexing rather than repeated thin
> slices,

And I don't have "repeated thin slices" in mind at all.

> ....
> If you are expecting differently, then (I believe) you are expecting
> that slices are closed on the *left* (lowest number), open on the
> *right* (highest number). But that's not what slices do. (Whether they
> *should* do it is another story.)

Guido started this thread precisely to ask what they should do.  We
already know what they _do_ do ;-)

>>
>> So I would prefer that the i:j in s[i:j:k] _always_ specify the
>> positions in play:
>>
>>
>> If i < 0:
>>     i += len(s)  # same as now
>> if j < 0:
>>     j += len(s)  # same as now
>> if i >= j:
>>     the slice is empty!  # this is different - the sign of k is irrelevant
>> else:
>>     the slice indices selected will be
>>         i, i + abs(k), i + 2*abs(k), ...
>>     up to but not including j
>>     if k is negative, this index sequence will be taken in reverse order

> In other words, you want negative strides to just mean "reverse the
> slice"

If they're given a ;meaning at all.

>. Perhaps that would have been a good design. But we already have
> two good idioms for reversing slices:
>
> reversed(seq[start:stop:step])

I'm often annoyed by `reversed()`, since it returns an iterator and
doesn't preserve the type of its argument.

>>> reversed('abc')
<reversed object at 0x00C722D0>

Oops!  OK, let's turn it back into a string:

>>> str(_)
'<reversed object at 0x00C722D0>'

LOL!  It's enough to make a guy give up ;-)  Yes, I know ''.join(_)
would have worked.

> seq[start:stop:step][::-1]

That's an improvement over seq[start:stop:-step]?  Na.

>> ...
>> So it's always a semi-open range, inclusive "at the left" and
>> exclusive "at the right".  But that's more a detail:

> It isn't a mere detail,

Not "mere", "more".

> it is the core of the change: changing from inclusive at the start
> to inclusive on the left,

No, the proposal says a[i:j:anything] is _empty_ if (after normalizing
negative i and/or negative j) i >= j.  "The start" and "the left" are
always the same thing under the proposal (where "the start" applies to
the input subsequence - which may be "the end" of the output
subsequence).

> which are not the same thing. This is a significant semantic change.

Yes, it is.

> (Of course it is. You don't like the current semantics, since they trick
> you into off-by-one errors for negative strides.

No, I dislike the current semantics for the same reason it appears
Guido dislikes them:  they're hard to teach, and hard for people to
get right in practice.

> If the change was insignificant, it wouldn't help.)

Bingo ;-)

> One consequence of this proposed change is that the <start> parameter is
> no longer always the first element returned. Sometimes <start> will be
> last rather than first. That disturbs me.

?  <start> is always the first element of the subsequence to which the
stride is applied.  If the stride is negative, then yes, of course the
first element of the source subsequence would be the last element of
the returned subsequence.

>> ...
>> Of course I'd change range() similarly.

> Currently, this is how you use range to count down from 10 to 1:
>
>     range(10, 0, -1)  # 0 is excluded
>
> To me, this makes perfect sense: I want to start counting at 10, so the
> first argument I give is 10 no matter whether I'm counting up or
> counting down.
>
> With your suggestion, we'd have:
>
>     range(1, 11, -1)  # 11 is excluded
>
> So here I have to put one more than the number I want to start with as
> the *second* argument, and the last number first, just because I'm
> counting down. I don't consider that an improvement. Certainly not an
> improvement worth breaking backwards compatibility for.

I agree this one is annoying.  Not _more_ annoying than the current
range(10, -1, -1) to count down from 10 through 0 - which I've seen
people get wrong more often than I can recall - but _as_ annoying.
reversed(range(1, 11)) would work for your case, and
reversed(range(11)) for mine.