Is this a dream or a nightmare? (Was Re: XML)
Tim Peters
tim_one at email.msn.com
Sat Oct 7 16:43:05 EDT 2000
[David T. Grove]
> ...
> There have been three things keeping me out of Python:
> ...
> 2) I can't mentally deal with [:5] when logically it's [:4]
> ...
> Actually, 2 I can probably deal with using a compatibility library
> and a function or two to make slices make sense until I'm used
> to them being off by one.
Actually, this part of Python is a dream. When I was learning C, I was
baffled by why it declared arrays with a number one larger than the largest
valid index, e.g.
int a[10];
With a Fortran background at the time, "it was obvious" that arrays *should*
be declared with their minimum and maximum indices instead:
INTEGER A(0:9)
Therefore, since C only allows 0 as its lower bound, the proper declaration
would be
int a[9];
But of course C had nothing of the sort in mind, and 10 is simply the number
of elements in the array. Perfectly clear to everyone but me <wink>.
You're similarly suffering a bad illusion wrt Python slices: from the
Python POV, it would be insane to write [:4] when you want the first *five*
elements -- *that* would cause a boundless number of off-by-one errors.
The trick is that indices in Python point *between* array elements:
|a[0]|a[1]|a[2]|a[3]|a[4]|a[5]| ... |a[len(a)-1]|
^ ^ ^ ^ ^ ^ ^ ^ ^
0 1 2 3 4 5 6 ... len(a)-1 len(a)
<--------- a[:5] -------->
When you've got N adjacent boxes, there are N+1 "walls", and trying to map
those into the ints from 0 thru N-1 inclusive is necessarily ambiguous.
Slice notation names the walls, not the boxes.
Picture the slice bounds as *intended*-- as identifying the gaps between
elements --and there's a unique obvious int in the range 0 thru N inclusive
to identify each wall, and you'll find that off-by-one errors are much less
common in Python than in C or Perl. In general, a[i:j] contains j-i
elements, and a[i:j] == a[i:k] + a[k:j], and those two cover a universe of
practical problems. Give it a chance, and the notion that, e.g., a[i:i+2]
could contain *three* elements will become abhorrent.
The same "indices point between" idea is implicit in Java and the C++ STL,
and explicit in the Icon language, for the same reasons: in practice, it
works better.
when-things-aren't-clear-turn-on-a-light-ly y'rs - tim
More information about the Python-list
mailing list