Slice confusion : a[n:p] is a list exclude the last element p

Mon Apr 28 12:17:16 EDT 2003

Alex> It's an important design principle, which I first saw proposed
Alex> in print by Andrew Koenig in his landmark book "C Traps and
Alex> Pitfalls": *all bounded loops should be first-bound-included,
Alex> last-bound-excluded* -- and which I will therefore call
Alex> "Koenig's Principle" although I have never seen the principle
Alex> *named* in the literature (I'mm CC'ing Andrew because of this --
Alex> I know there are already other things named after him, such as
Alex> the "Koenig Lookup" in C++, that he might like to get his name
Alex> AWAY from, so it seems only fair to allow him to disclaim THIS
Alex> one too ASAP if he dislikes it!-).

Nope, I'm not going to disclaim this one :-)

First, a bit of history: What is sometimes called ``Koenig lookup'' in
C++, and officially called ``argument-dependent lookup,'' is not my
invention.  My main association with it was to point out to the
standards committee that there was a problem that had to be solved,
that an existing proposal--I don't know the original author--would
solve the problem, and I knew of no other proposal that would solve
the problem.  Therefore, even though the proposal wasn't perfect,
adopting it would be an improvement.  I still feel that way, even
though the problems are greater than the committee realized at the
time they adopted it.  

--- Readers uninterested in C++ should skip the following ---

Here is an example that shows the magnitude of the problem:

    #include <iostream>

    int main()
    {
        std::cout << "Hello, world!\n";
        return 0;
    }

How does the compiler know where to look for << ?  The answer is that
<< is a member of the std::cout object, and it is overloaded with the
right-argument type of ``const char *,'' which is the type of the
literal "Hello, world!\n".

Now, let's change the program a tiny bit:

    #include <iostream>
    #include <string>

    int main()
    {
        std::string hello = "Hello, world!\n";
        std::cout << hello;
        return 0;
    }

Now, how does the compiler know where to look for <<?  The version of
<< with a right-argument type of ``std::string'' is not a member of
the std::cout object, because making it a member would create an
unnecessary dependency between the I/O and string classes.  But if <<
isn't a member, how does the compiler find it?  Why, in other words,
do we not have to write

        std::operator<<(std::cout, hello);

instead?

This question has to be answered somehow, and I still think that the
best answer available is to say that because the right argument of <<
has a type that is defined in namespace ``std,'' the compiler should
include namespace ``std'' in the list of places it searches for the
declaration of <<.

--- Readers uninterested in C++ can resume here ---

Now let's talk about how to express ranges.

It is certainly true that if you are expressing a range of (signed)
integers, you can use either the last element of the range or one past
the last element as your ending specification.  Similarly, but less
obviously, you can also use the first element or one before the first
element as your starting point.

So, for example, to express the integers 4, 5, 6, 7, you can write [4,
7], (3, 7], [4, 8), or (3, 8).  Here, the [ ] mean that the range
*includes* the number next to the bracket, and the ( ) mean that the
range *excludes* the number next to the parenthesis.  The question,
then, is which kind of range to prefer.

Alex has already pointed out that if you use asymmetric bounds, there
is the great convenience that the number of elements in the range is
equal to the difference between the bounds.  So, for example, if you
use (3, 7] or [4, 8), the difference between the bounds is 4 in either
case, and that's how many elements there are.  However, the real
convenience of using asymmetric bounds doesn't show up until you start
talking about ranges of values that don't have ordering comparisons
available.

Suppose, for example, that you have a singly-linked list of objects.
In Python, you might do that with a class with a ``next'' method that
yields either the next object in the list, or None if you've run off
the end.

Let's suppose that ``listhead'' refers to the first object in the
list (or is None if the list is empty).  Then you might count the
objects in the list this way:

        item = listhead
        count = 0
        while item is not None:
            count += 1
            item = item.next()

Now, suppose that you want to be able to represent an arbitrary
subsequence of the elements in this list.  How would you do so?
Presumably you're going to want to store a pair of references--but
references to what?  You *cannot* use a reference to the first and
last elements of the sequence, because if you do, you will find that
there is no way to represent an empty sequence.

What you can do is use a reference to the first element of the
subsequence and the first element beyond the subsequence.  Then
it all works out.  For example, you can use (listhead, None) to
represent the whole sequence, (listhead, listhead) to represent
the null subsequence at the beginning, (listhead, listhead.next())
to represent only the first element (assuming that there is a
first element), and so on.

Then, presuming that ``begin'' and ``end'' represent the (asymmetric)
bounds, you can process all of the elements in the range by writing

        item = begin
        while item is not end:
            # process item
            item = item.next()

Try writing this loop with symmetric bounds and you'll see why
I don't like them.

Now, of course, there is no requirement to use the same strategy
for integers and iterator-like values.  But what's the advantage
of two different strategies?  I don't see any.

>From the foregoing argument, I conclude that it's easiest
overall to represent *all* ranges asymmetrically: the first
element of the range and the first element beyond the range.

-- 
Andrew Koenig, ark at research.att.com, http://www.research.att.com/info/ark