inclusive-lower-bound, exclusive-upper-bound (was Re: Range Operation pre-PEP)

Fri May 11 17:47:30 EDT 2001

Tim Peters wrote:
> 
> [Andrew Maizels]
> > OK, next question: why does Python start indexes at zero?
> 
> Like C (and many other languages), Python views indices as *offsets* from the
> start of the sequence being indexed.  The element at the start of a sequence
> is clearly at offset 0, etc.  Note that since Python is very keen to make
> writing extension modules in C pleasant, it's quite a practical benefit that
> they have the same view of this.

And C gets it from the hardware, which is perfectly understandable. 
(Though C uses pointers for arrays and strings, which is hideous.)

Making it easy for implementors of C modules is not my primary goal in
Pixy, but it's worth considering.  Of course, the run-time interpreter
(Pixy is compiled to a byte-code) is written in C, so I need to look
after myself as well.

> > Your example would work perfectly well if the range returned
> > [1, 2, 3, 4] and the list was indexed starting with 1.  Basically,
> > range(4) has to produce a list of four items, we just differ on
> > what those items should be.
> 
> But sequences *are* indexed starting at 0 in Python, so having range(4)
> produce [1, 2, 3, 4] *in Python* would be, well, stupid.  The decisions
> aren't independent.

Sure, agreed.

> > I'm not just being difficult; I'm trying to design my own language,
> > and this is one of the things I have different to Python.  If I've
> > missed something where the Python way is superior, then I might want
> > to change my mind.
> 
> You can make it work either way, although (as above) there's reason to favor
> 0-based indexing if ease of talking between Pixy and C is interesting to you.
> Icon is a good example of a language with the same basic "indices point
> *between* elements" (== "indices are offsets") approach, but where indices
> start at 1.  In two respects this can be nicer:
>
> 1. The last element of a non-empty Icon list (or string) is (using
>    Python spelling) list[len(list)].  In Python, at the start, the
>    only way to spell it was list[len(list)-1].  That created its own
>    breed of "off by 1" errors.  But Python later grew meaning for
>    negative list indices too, and since then list[-1] is the best
>    way to get at the last list element.

Right.  However you do it, you must be consistent.  I like the negative
indexes in Python too, but I'm not sure if (or how) I'll add them to
Pixy.

But list[len(list)] still works if you have the index point to the
element rather than the gap - if you count from 1.

> 2. Spelling "the point just beyond the end of the sequence" is
>    easier in Icon:  in Python that's index len(list), in Icon it's
>    index 0 (or *its* breed of off-by-1 temptation, len(list)+1).
>    That is, indices in the 0-based Python look like:
> 
>       x[0] x[1] x[3]
>      0    1    2    3  positive flavor
>     -3   -2   -1    3  negative flavor
> 
>    but in the 1-based Icon they're:
> 
>       x[1] x[2] x[3]
>      1    2    3    4  positive flavor
>     -3   -2   -1    0  negative flavor
> 
>    If only 1's-complement integer arithmetic had caught on, Python
>    could get rid of the "3 wart" in the lower-right corner by using
>    -0 instead <wink>.

Well, you could always use floating point; the IEEE standard supports
-0.  (Ewww!)

This is a more telling example of the advantage of pointing between
elements: you have a consistent notation for "before the first element"
and "after the last element", which Pixy (as currently designed) doesn't
have.  I'll have to take another look at Icon, it's been years since
I've played with it.

> > The way I have things at the moment, in Pixy (my language), array
> > indexes default to start at 1, but can be declared to any range (like
> > Pascal).
> 
> Or Perl or Fortran77 or any number of other languages.  The flexibility
> creates its own problems, though; for example, how can I write a general
> routine in Pixy to iterate over the elements of a passed-in array? 

In Pixy you can easily determine the bounds of an array (and the number
of dimensions).  Any sequence type, indeed any collection, has bounds
and size properties, which may be static or dynamic.

> In Python
> (or C, or any number of other languages), I can always start indexing at 0.
> In Pascal you have to clutter the argument list by passing the array bounds
> as well as the array.  In Perl, the index base is a magical global vrbl and
> applies to *all* arrays, and then routines written *assuming* a particular
> base (not coincidentally, usually the author's favorite base <wink>) can work
> or fail depending on whether somebody else fiddled the global's value. 

Hmm.  I think I'll try to avoid that :)  ("Magic globals considered
harmful".)

> In
> Ada there are inquiry functions to *ask* an array what its declared bounds
> were; that allows writing general code without relying on globals or
> cluttering argument lists, but general code is wordy due to all the
> inquiries, and array objects have to allocate space to store the bounds info.

Space isn't really an issue unless you have zillions of tiny arays;
wordiness is (or can be) a problem.  I'm trying not to create a new
COBOL (though Pixy's problem domain is similar).  In Pixy you can
iterate with "for x in y" just like Python (well, you can't at the
moment, because the compiler can't cope with anything much more
complicated than a := (x + y) / z, but that's just an implementation
detail).

> > Strings are indexed starting with 1 as well.
> 
> I should hope so.
> 
> > Is there a good reason not to do this?
> 
> If you can't think of at least three "good reasons" to do this *and* not to
> do this, learn some more languages.

Well, I can think of lots of reasons either way, but none of them are
obvious killers, so it boils down to the taste of the language designer
(me!)  Since I've never designed a general-purpose language before (I've
done a couple of macro languages) I want to make sure that I don't miss
something and get bitten by it later.  I've programmed in Basic, Pascal,
Logo, Fortran, Prolog, Perl, Python, Postscript, C, various assemblers,
various shells, the Progress 4GL (a proprietary language, but a good
one), and dabbled in others (Icon, Modula-2, Forth).  And I've probably
forgotten something.

The point about indexing *between* elements is a good one, and something
I'll have to think about.

I like Python (a lot!) but it's not ideal for my particular problem
domain.  The Pixy compiler is written in Python, by the way.  The idea
is to rewrite it in Pixy once it becomes powerful enough.  (It's a
pretty standard recursive-descent parser; I've never met a
compiler-building tool I liked.)

> There are almost no pure wins or pure
> losses in language design.

You're so right about no pure wins or pure losses; you set off with a
clean sheet of paper and almost immediately you're up to your neck in a
sea of tradeoffs.  Which is why no perfect language exists, which is why
language designers are still around and as busy as ever.

For example, I like Python's use of new-line as a statement separator,
but in Pixy you can embed relational database queries directly into your
code, and given the target application domain (business applications
like accounting, customer service etc), I expect they'll be one of the
commonest constructs in Pixy programs.  Unfortunately, queries have a
tendency to be quite lengthy, and trying to squash them into one line or
having line-contination markers all over the place both seem like bad
ideas.  Which means that having a statement separator (or terminator) is
the better option, so I have to choose a character for it, which means
that character is either not available for other uses or has to have
some disambiguation mechanism to allow the compiler to work out what you
mean.  Oh, and I don't like semi-colons.  Don't know why; they just
irritate me.  (In code that is; I'm perfectly happy with them in
English).

And there's a constant fight between compact and expressive notation and
legibility: I don't expect the users of Pixy to be language experts, and
I've often seen that a mid-level programmer can work through a
three-hundred line implementation of an algorithm, but is completely
thrown by a thirty-line implementation - the ideas are just too densely
packed for them to take in.

> see-abc-for-why-a-newbie-friendly-language-is-a-bad-idea-and-c++-
>     for-why-it's-a-good-one<wink>-ly y'rs  - tim

To be successful, a language has to be accessible to newbies and useful
to experts.  Python does pretty well; I've looked at C++ and shuddered,
though I'm comfortable enough in C.  I looked at Java and laughed - a
language where printing "Hello world!" involves the sequence "public
static void" is some sort of joke, though I'm not sure what.

Andrew.
-- 
There's only one game in town.
You can't win.
You can't break even.
You can't quit the game.          -- The four laws of thermodynamics.