Newbie: The philosophy behind list indexes

Sat Jun 15 01:49:33 EDT 2013

On Sat, Jun 15, 2013 at 3:21 PM,  <ian.l.cameron at gmail.com> wrote:
> What is the thinking behind stopping 'one short' when slicing or iterating through lists?
>
> By example;
>
>>>> a=[0,1,2,3,4,5,6]
>>>> a
> [0, 1, 2, 3, 4, 5, 6]
>>>> a[2:5]
> [2, 3, 4]
>
> To my mind, it makes more sense to go to 5. I'm sure there's a good reason, but I'm worried it will result in a lot of 'one-off' errors for me, so I need to get my head around the philosophy of this behaviour, and where else it is observed (or not observed.)

There are two equally plausible ways to identify positions in a
list/string/whatever. One is to number the elements, the other to
number the gaps between them. I'll try my hand at some ASCII art... be
sure to view this in a monospaced font.

[  0  ,  1  ,  2  ,  3  ,  4  ,  5  ,  6  ]
|     |     |     |     |     |     |     |
0     1     2     3     4     5     6     7
-7   -6    -5    -4    -3    -2    -1    (0)

When you ask for the slice from 2 to 5, you get the elements between
those slot markers. That's [2,3,4].

When you ask for negative indices, the same applies, only there's no
parallel way to ask for negative 0 aka end of list. [1]

>>> a[2:-2]
[2, 3, 4]

There are a number of reasons for working this way. For instance, the
length of the range a[x:y] is simply y-x, negative indices aside. It's
even more significant when you look at something that doesn't have
discrete units - such as times.

Suppose you invent a data type to represent a time range. You might
describe a TV show as lasting from 10:00 till 10:30; but what do you
really mean by those times? Do you mean from the start of 10:00 until
the end of 10:30? When is the end of 10:30? Is it the end of the
minute 10:30, the end of the second 10:30:00, the end of the
millisecond 10:30:00.000? Easier to describe it as the beginning of
that moment, because that has the same meaning regardless of your
resolution. You can always add more trailing zeroes to either the
start time or the end time, without changing the meaning of the range.

Same applies to generation of random numbers. If you have a function
that generates a random number uniformly in the range [0,1) - that is,
including 0 but not including 1 - and you multiply it by an integer
and truncate the decimal, you get a random integer uniformly in the
range [0,x), which is an extremely useful thing. You don't even need
to care what the actual range in the RNG is (does it produce 0.000
through 0.999, or 0.000000 through 0.999999?), as long as it's
significantly more than your target range. But if the RNG could return
1.0, then you need to deal with that possibility in your result, which
frankly isn't much use.

It takes some getting used to, perhaps, given that most people in the
real world work with closed ranges; but ultimately it makes far more
sense. And if it weren't for a huge case of lock-in, I would wish we
could change the way Scripture references are interpreted, for the
same reasons. Taking examples from tomorrow's church service, two of
the readings are Matthew 18:15-20 and Philippians 3:8-10. When you
look in a Bible, you'll find verse numbers preceding the verses (at
least, that's the convention in most editions). If the ranges were
written as half-open (eg Matt 18:15-21), it would be simply from verse
marker 15 to verse marker 21; and "to the end of the chapter" or "to
the end of the book" would have obvious notations (eg Matt 18:15-19:1
or Matt 27-Mark 1). Of course, this would make for a huge amount of
confusion, since the present system has been around for centuries...
but it would make more sense, so I'm very much glad it's the way
Python chose to do it :)

ChrisA

[1] You can use None or omit the index, but there's no "negative 0"
integer to use.