Delayed evaluation of expressions [was Re: Time we switched to unicode?]

Wed Mar 26 22:16:14 EDT 2014

On Wed, 26 Mar 2014 20:44:17 -0400, Terry Reedy wrote:

> I agree that we have not been understanding each other.
> 
>  From you original post that I responded to:
>>>>>> The thing is, we can't just create a ∑ function, because it doesn't
>>>>>> work the way the summation operator works. The problem is that we
>>>>>> would want syntactic support, so we could write something like
>>>>>> this:
>  >>>>>        p = 2
>  >>>>>        ∑(n, 1, 10, n**p)
> 
> The initial misunderstanding is that I interpreted 'something like this'
> more loosely than you meant it. So I wrote something that was 'something
> like the above' to me but not to you. I interpreted 'something like'
> semantically* whereas you apparently meant is syntactically. I added the
> one or the other little marks needed to make the above work in python as
> it is whereas your 'something like' excludes such marks or any other
> current possibility I can think of. So I actually agree that "can't" is
> correct in relation to what you meant, as opposed to what I understood.
> 
> Changing "can't" to "can" would requires a major change to Python. That
> change is one I would strongly oppose and I will try to explain more
> clearly why.

I'm not making a serious proposal to change Python's behaviour. In 
context, this came up because I said that allowing arbitrary maths 
symbols in Python wouldn't work, because you can't get the behaviour 
right, and used the example of ∑(n, 1, 10, n**p) as something where the 
behaviour would be different from what a mathematician would like. Given 
that the semantics would be so different, there's little point in trying 
to match the symbol ∑ too. Just write it as sum() in the Pythonic style. 
Python is not Mathematica. (However, you could, perhaps, write 
Mathematica in Python.)

That was the context of introducing delayed evaluation to the discussion. 
But having raised it, I do think it is an important and useful feature. 
Python already has it in various ad hoc places, such as list comps and 
generator expressions, ternary if, and/or, and it came up again recently 
in the PEP for try...except expressions. It doesn't happen every day, but 
there is a steady trickle of requests (some successful, some not) for 
compiler support for something that includes delaying the evaluation of 
some expression until later. Some day, if all the pieces come into place, 
I may make a serious proposal for a generic mechanism for delaying 
execution of expressions. But that is not this day.

[...]
> Lets start with things we should agree on.  The first two are mostly not
> specific to Python.
> 
> 1. When a compiler encounters an expression in code, there are at least
> three things it can do:
> 1a. compile it so that it is immediately executed when encountered at
> runtime (let this be the default);
> 1b. compile it so that it is somehow saved for execution later or
> elsewhere (the alternative of concern here); 
> 1c. compile it for a different alternative, such as turning an implied
> 'get' operation into a 'set' operation.

I don't actually understand what 1c is, or rather I understand what you 
mean, I don't understand why a compiler would do that. But I don't think 
it is important, so carry on.

> 2. A code writer who wants an alternative treatment for a particular
> must somehow delineate the expresion with boundary markers that, in
> context at least, also indicate the alternative treatment.

Not necessarily. That's not how it works in Pascal and Algol. In the case 
of Algol, argument passing is pass-by-name unless declared otherwise. In 
the case of Pascal, expressions are always evaluated first (pass-by-
value), except for one special case: if the expression consists of a 
single name, AND the function declares the parameter to be a "var" 
parameter, Pascal uses pass-by-reference, which you can think of as 
conceptually like a cut-down restricted version of pass-by-name.

But the point is that the decision whether to evaluate the expression 
immediately or not could be up to the compiler, not the caller. In fact, 
that's what Python already does:

    mylist and mylist[0]

always delays evaluation of the second clause, the caller doesn't have to 
do anything special except to ensure she writes them in the right order.

The problem with the Pascal approach, where the function declares what 
calling mechanism to use, is that this needs the function to be known at 
compile time so that the compiler knows what mechanism to use to pass the 
argument into the function. As far as I know, the languages which offer a 
choice of calling convention (Pascal, Basic?) have static function 
declarations known at compile-time. I'm not sure about Perl.

Anyway, the point is that in principle there is another mechanism: the 
compiler knows whether to delay evaluation or not, and the caller has no 
say in it.

> There are, broadly, two possibilities:
> 2a. Use a generic quotation mechanism that can be applied to any
> expression most any place. I call this explicit quoting. 

Skipping ahead:

> Python has 2 pairs of generic explicit delimiters: open-close quote
> marks and lambda-<EndofLambda>, where <EndofLambda> is ',', ')',
> <EndofLine>, or maybe something else. If these are not present, the
> default immediate execution mode is used for expressions in function
> calls that are not themselves marked for alternative treatment.

I think that's a misuse of terminology and badly missing the point. If 
you have to *manually* manage this process yourself, by writing a 
function, or calling eval() on a string, it's not a feature of the 
language. Saying that Python has a generic mechanism for delaying 
execution of an expression ("put it in a function, or use a string and 
eval() the string later") is like saying that C has garbage collection 
("just keep track of which pointers you are or aren't using yourself").

Given the lack of such delayed execution, a reasonable work-around may 
sometimes be to use a function. But it's not the same. If language 
features weren't important, we'd all be using FORTRAN I exactly as it was 
in the 1950s.

> 2b. Use the
> expression in a special form that the compiler knows about. The special
> form must have begin-end markers of some sort. 

That's incorrect, unless you think "Start Of Expression" and "End Of 
Expression" are markers.

> I call this implicit
> quoting. If human readers do not know that a particular form is a
> special form, they may have a problem understanding the code.

Um, yes? If you don't know the semantics of the language you are reading, 
any language, you're going to have trouble understanding it. If I see:

mylist = [1, 2, 3, 4, 5]*10000000
for i in range(30):
    print function(mylist, i)

and I don't know how Python works, I might say "ZOMG! That has to walk a 
fifty-million item linked list thirty times, copying it each time it is 
passed to the function! How inefficient!". And I would be wrong.

We're allowed to suppose that Python programmers are aware that lists 
aren't linked-lists, and that passing a list to a function does not copy 
it. We're allowed to suppose that Python programmers know that the 1/x 
expression in `1/x if x != 0 else y` is not evaluated unless x is non-
zero. If there is a syntactic form that delays evaluation, like and/or 
delay evaluation, we're allowed to presume the programmer knows that this 
is what it does.

> Python's special forms are statements. Each has its own pair of
> delimiters. Assignments use <BeginningofLine> and '='. 'For' loops use
> 'for' and 'in'. Other statements use 'as' and usually <EndofLine>.

Not always. You've missed ternary if, and, or, list comprehensions and 
generator expressions, all of which are expressions, not statements.

Python can delay execution of an expression within another expression. 
It's just that so far, they have to be specially treated by the compiler. 
There's no generic mechanism for this.

> Statements and functions call are syntactically very distinct, so there
> is little possibility of confusion. I consider this a major feature of
> python and I would oppose breaking it.

I'm not sure if this is relevant or not.

> Lisp, for instance, uses s-expressions for everything, including what
> would either function calls or statements in Python. Special functions
> implicitly quote some argument expressions, but not necessarily all,
> while normal functions do not quote any. The only way to know is to
> know, and I found it confusing and difficult to remember.
> 
>  > Sum(i, 1, 100, V[i])
>  > In Algol60, this function call would:
> 
>  > - pass the name "i" (not a string!) as the first argument; - pass 1
>  > as the second argument;
>  > - pass 100 as the third argument;
>  > - pass the expression "V[i]" (not a string!) as the fourth argument
> 
> which depends for this operation on
> 
>  > https://en.wikipedia.org/wiki/Jensen%27s_device
> 
> I read the whole article, including the criticisms and the fact that it
> was not widely adopted and has been more or less superceded by macros.
> It did not answer the obvious question: suppose the call is Sum(i, l, h,
> V[i]). How is the reader supposed to know that 'i' and 'V[i]' get quoted
> and the other args do not?

They infer it from the usage.

(Python example: given that you can write "mylist and mylist[0]", and it 
does not raise IndexError when mylist is empty, so we can infer that the 
second term "mylist[0]" is not evaluated unless the first is true.)

Or they read the docs of the Sum function. Or both.

> The article included
> 
>   real procedure Sum(k, l, u, ak)
>        value l, u;
>        integer k, l, u;
>        real ak;
>        comment k and ak are passed by name;
>     begin
>        real s;
>        s := 0;
>        for k := l step 1 until u do
>           s := s + ak;
>        Sum := s
>     end;
> 
> If that is supposed to be real code that can be compiled, I see no way
> for the comment to be true. Or is the mechanism limited to builtin
> functions?

Pass-by-name is the default argument passing mechanism in Algol 60. The 
middle two parameters, l and u, are declared as pass-by-value in the 
second line. That means that k and ak use the default mechanism, which is 
pass by name.

Because functions in Algol 60 are statically determined at compile time, 
the compiler now knows how to pass arguments into the Sum function. I 
don't know how you would do that in a language like Python where the 
function is not statically known at compile time.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/