Is this a dream or a nightmare? (Was Re: XML)

Tim Peters tim_one at email.msn.com
Sat Oct 7 16:43:05 EDT 2000


[David T. Grove]
> ...
> There have been three things keeping me out of Python:
> ...
> 2) I can't mentally deal with [:5] when logically it's [:4]
> ...
> Actually, 2 I can probably deal with using a compatibility library
> and a function or two to make slices make sense until I'm used
> to them being off by one.

Actually, this part of Python is a dream.  When I was learning C, I was
baffled by why it declared arrays with a number one larger than the largest
valid index, e.g.

    int a[10];

With a Fortran background at the time, "it was obvious" that arrays *should*
be declared with their minimum and maximum indices instead:

      INTEGER A(0:9)

Therefore, since C only allows 0 as its lower bound, the proper declaration
would be

    int a[9];

But of course C had nothing of the sort in mind, and 10 is simply the number
of elements in the array.  Perfectly clear to everyone but me <wink>.

You're similarly suffering a bad illusion wrt Python slices:  from the
Python POV, it would be insane to write [:4] when you want the first *five*
elements -- *that* would cause a boundless number of off-by-one errors.

The trick is that indices in Python point *between* array elements:

   |a[0]|a[1]|a[2]|a[3]|a[4]|a[5]|  ... |a[len(a)-1]|
   ^    ^    ^    ^    ^    ^    ^      ^           ^
   0    1    2    3    4    5    6  ... len(a)-1    len(a)
   <--------- a[:5] -------->

When you've got N adjacent boxes, there are N+1 "walls", and trying to map
those into the ints from 0 thru N-1 inclusive is necessarily ambiguous.
Slice notation names the walls, not the boxes.

Picture the slice bounds as *intended*-- as identifying the gaps between
elements --and there's a unique obvious int in the range 0 thru N inclusive
to identify each wall, and you'll find that off-by-one errors are much less
common in Python than in C or Perl.  In general, a[i:j] contains j-i
elements, and a[i:j] == a[i:k] + a[k:j], and those two cover a universe of
practical problems.  Give it a chance, and the notion that, e.g., a[i:i+2]
could contain *three* elements will become abhorrent.

The same "indices point between" idea is implicit in Java and the C++ STL,
and explicit in the Icon language, for the same reasons:  in practice, it
works better.

when-things-aren't-clear-turn-on-a-light-ly y'rs  - tim






More information about the Python-list mailing list