[Python-ideas] Tail Call Optimization -- natural? intuitive?

Mon Jan 20 15:29:47 CET 2014

On 01/20/2014 02:09 PM, Chris Angelico wrote:
> On Mon, Jan 20, 2014 at 11:12 PM, spir <denis.spir at gmail.com> wrote:
>> For instance, closed intervals are more intuitive or natural, obviously (but
>> for some reason I don't know). If you ask someone to count from 1 to 9, you
>> will probably be surprised to hear him/her start from 2 or stop after 8. If
>> you are asked to choose a letter between c and g, you will probably be
>> surprised to hear that 'c' or 'g' is no good choice.
>
> I'm not so sure about that. The half-open interval makes as much sense
> as the fully closed - all you have to do is interpret the indices as
> being *between* elements. Take, for example, Scripture verses. (Quotes
> taken from THE HOLY BIBLE, NEW INTERNATIONAL VERSION®, NIV® Copyright
> © 1973, 1978, 1984, 2011 by Biblica, Inc.® Used by permission. All
> rights reserved worldwide. Copyright notice included for license
> compliance. Note that I'm using bracketed numbers to indicate the
> beginnings of verses - in a printed Bible, these would normally be in
> superscript.)
>
> John 14:
> [31] To the Jews who had believed him, Jesus said, “If you hold to my
> teaching, you are really my disciples. [32] Then you will know the
> truth, and the truth will set you free.” [33]
>
> This passage is normally referred to as "John 14:31-32", but as you
> see, the verse marker [32] is in the middle of the quote. Using a
> half-open interval, this would start at "John 14:31" and end at "John
> 14:33". Half-open means: "Begin at the beginning, go on till you come
> to the end, then stop", as the King of Hearts instructed the White
> Rabbit.
>
> It's easy to indicate the beginning of a chapter: your start reference
> is verse 1. Here's the beginning of the account of the creation of the
> world:
>
> [1] In the beginning God created the heavens and the earth. [2] Now
> the earth was formless and empty, darkness was over the surface of the
> deep, and the Spirit of God was hovering over the waters. [3] And God
> said, “Let there be light,” and there was light. [4] God saw that the
> light was good, and he separated the light from the darkness. [5] God
> called the light “day,” and the darkness he called “night.” And there
> was evening, and there was morning—the first day. [6]
>
> Common parlance: Genesis 1:1-5. Half-open: Genesis 1:1-6. Conclusion:
> Tie. No argument to be made for either side. But what if you're
> looking at the *end* of a chapter? Here are a few verses from later on
> in Genesis 1:
>
> [29] Then God said, “I give you every seed-bearing plant on the face
> of the whole earth and every tree that has fruit with seed in it. They
> will be yours for food. [30] And to all the beasts of the earth and
> all the birds of the air and all the creatures that move on the
> ground—everything that has the breath of life in it—I give every green
> plant for food.” And it was so. [31] God saw all that he had made, and
> it was very good. And there was evening, and there was morning—the
> sixth day.
>
> Common parlance: Genesis 1:29-31. Half-open: Genesis 1:29-2:1. It's
> much more obvious by the latter that this passage extends exactly to
> the end of the chapter.

I do agree with your reasoning, it is indeed totally logical. However, it is not 
at all intuitive or natural (maybe tis is why Bible refs do not work your way 
;-) dunno).

This is probably related to the issue of prog indices interpreted as ordinals 
[*] or offsets. Aparently, obviously in fact, people intuitively or naturally 
interpret them as ordinals; which breaks your logic or conflicts with it. 
Whether it's "much more obvious" (quoting you in the last parag above) is a also 
question of how you interpret indices: if they're ordinals for you, then Genesis 
1:29-31 is perfectly clear on where the ref'ed passage stops.

[*] "ordinal" in the mathematical or linguistic sense, meaning a natural number 
holding the rank of an item in a sequence (not python's ord())

> Obviously it's way WAY too late to change the way Bible references are
> written, any more than Melway could renumber their maps all of a
> sudden. Massive case of lock-in and backward-incompatibility with
> existing code. But I put it to you that the half-open would make at
> least as much sense as the closed, in any situation where there are
> boundaries with contents between them.
>
> Note, by the way, that I'm not looking at anything involving backward
> scanning or wider strides, both of which Python's slice notation
> supports. Neither of those is inherently real-world intuitive, so the
> exact semantics can be defined as whatever makes sense in code. (And
> there was some discussion a little while ago about exactly that.) I'm
> just looking at the very simple and common case of referencing a
> subset of consecutive elements from a much larger whole.
>
> The closed interval makes more sense when the indices somehow *are*
> the values being retrieved.

You are right; see also note below on the case where [i,k[ is actually 
advantageous by itself.

> When you count from 1 to 9, you expect
> nine numbers: 1, 2, ..., 8, 9. When you list odd numbers from 1 to 9,
> you expect 1, 3, 5, 7, 9. But what if you're looking at a container
> train and numbering the twenty-foot-equivalent-units (TEU) that it
> has? A 40-foot container requires 2 TEU, a 60-foot container requires
> 3 TEU. A "reefer" (refridgerated container) might require an extra
> slot, or at least it might be a 56-footer and consume 3 TEU. One wagon
> might, if you're lucky, carry 5 TEU; numbering them 1 through 5 would
> be obvious, but numbering the boundaries between them as 0 through 5
> is better at handling the multiple TEU containers. (Even more so when
> you look at double-stacked containers. An over-height 40-foot
> container could consume 2 TEU horizontally and 2 TEU vertically, and
> be put in slots (0,0)-(2,2). This is, in fact, exactly how a GTK2
> Table layout works.) Both types of intervals have their places.

I also think there may be 2 kinds of notations for slices and such, one beeing 
[i,j] and the other maybe (i,n) where n is the number of items, rather than 
[i,k[ where k is the "post-last" or "past-the-end" index. Reasons to think on 
that path:

* since n is not an index, it avoids all thinking trouble and misinterpretations 
with k as opposed to j; in particular, it avoids the "intuitive conflict" evoked 
above

* n makes sense and is useful by itself (eg think at typical arrays {p,n} or 
slices/views {i,n}, or at algos for copy, compare, traversal, concat, map...)

* when [i,k[ works better than [i,j], most often it's because we have n (k=i+n) 
or need n (n=k-j), thus we avoid +1 or -1; this, rather than any worth of k by 
itself

* other cases where [i,k[ seems to work nicely is "self-feeding" in fact: we 
have & need [i,k[ just because the lang uses that, but the same would be true 
whatever the interval notation (eg the lang returns i,k from a builtin func 
searching something in a seq, and we then use it to get a subseq)

* the only advantage of k by itself, logically, I think, is when scanning a 
non-terminated token (eg a number): we must pass the last item (digit) to know 
the token is finished, thus end up holding i & k, not i & j; however, if we use 
(i,n) notation, it's easy enough to write (i,k-i), so no big deal, just as in 
the opposite case; and this situation is obviously, i guess, a little minority 
of uses of intervals [1]

Anyway, i think only practice of alternatives and talk among non-ideologically 
blinded programmers can tell us what's worth or not.

Denis

[1] However, from a semantic point of view, [i,k[ is problematic even in this 
very case where it seems nicer at first sight, because we don't need to type -1. 
Say we're scanning for a number and there is "1234567" in source:
	<-----> n
	1234567
	i     jk
We get i & k. If the lang uses [i,k[ intervals, then we just write it that way 
to get the right substring, and are pleased not to have to type -1. However, the 
"semantic truth" (if I may say) is that we stopped scanning *after* the last 
digit, and need to slice up to the *previous* character. This is not written in 
s[i,k]; where is the idea "up to the previous position" expressed in this 
notation? "Previous" translates to -1 in arithmetic or programming. For the 
notation to be semantically correct, it should say "-1" somewhere. And this is 
why closed intervals s[i,k-1] are superior, from the semantic perspective, even 
in this case, the very case where half-open intervals superficially look nicer. 
Half-open intervals do not say what they mean, so-to-say, they cheat ;-) (booh!)

A related point (semantics, thinking) is that, as far as i know, many 
programmers in langs using [i,k[ just do *not* think it. They just know from exp 
that it just works in most cases (reasons listed above) but do not think, for 
instance in this case along the lines: "all right, we stop after the last digit, 
thus need to slice up to the previous position, thus a half-open interval is 
right here". No, they seem to do it blindly like an automat. I asked other 
programmers about that when I noticed it was true by me (i use it blidly, don't 
know on a given case why/how it works unless I stop and *start* to think). I 
just prefer (to be) a programmer who thinks than a coding machine, but it's just me.