[Python-ideas] Thunks (lazy evaluation) [was Re: Delay evaluation of annotations]

Mon Sep 26 10:04:52 EDT 2016

Hello everyone, this idea looks like something I have tried building
already: https://github.com/llllllllll/lazy_python. This project implements
a `thunk` class which builds up a deferred computation which is evaluated
only when needed. One use case I have had for this project is building up a
larger expression so that it may be simplified and then computed
concurrently with dask: http://daisy-python.readthedocs.io/en/latest/. By
building up a larger expression (and making the tree accessible) users have
the ability to remove common subexpressions or remove intermediate objects.
In numpy chained expressions often make lots of allocations which are
quickly thrown away which is why projects like numexpr (
https://github.com/pydata/numexpr) can be such a serious speed up. These
intermediates are required because the whole expression isn't known at the
start so it must be evaluated as written.

Things to consider about when to evaluate:

1. Functions which branch on their input need to know which branch to
select.
2. Iteration is really hard to defer in a way that is efficient.
lazy_python just eagerly evaluates at iteration time but builds thunks in
the body.
3. Stateful operations like IO which normally have an implied order of
operation now need some explicit ordering.

Regarding the `Py_TYPE` change: I don't think that is correct unless we
made a thunk have the same binary representation as the underlying object.
A lot of code does a type check and then calls macros that act on the
actual type like `PyTuple_GET_ITEM` so we cannot fool C functions very
easily.

On Mon, Sep 26, 2016 at 9:27 AM, Sjoerd Job Postmus <sjoerdjob at sjoerdjob.com
> wrote:

> On Mon, Sep 26, 2016 at 10:46:57PM +1000, Steven D'Aprano wrote:
> > Let's talk about lazy evaluation in a broader sense that just function
> > annotations.
> >
> > If we had syntax for lazy annotation -- let's call them thunks, after
> > Algol's thunks -- then we could use them in annotations as well as
> > elsewhere. But if we special case annotations only, the Zen has
> > something to say about special cases.
> >
> >
> > On Mon, Sep 26, 2016 at 02:57:36PM +1000, Nick Coghlan wrote:
> > [...]
> > > OK, that does indeed make more sense, and significantly reduces the
> > > scope for potential runtime compatibility breaks related to
> > > __annotations__ access. Instead, it changes the discussion to focus on
> > > the following main challenges:
> > >
> > > - the inconsistency introduced between annotations (lazily evaluated)
> > > and default arguments (eagerly evaluated)
> > > - the remaining compatibility breaks (depending on implementation
> details)
> > > - the runtime overhead of lazy evaluation
> > > - the debugging challenges of lazy evaluation
> >
> >
> > Default arguments are a good use-case for thunks. One of the most common
> > gotchas in Python is early binding of function defaults:
> >
> > def func(arg=[]):
> >     ...
> >
> > Nine times out of ten, that's probably not what you want. Now, to avoid
> > all doubt, I do not want to change function defaults to late binding.
> > I've argued repeatedly on comp.lang.python and elsewhere that if a
> > language only offers one of early binding or late binding, it should
> > offer early binding as Python does. The reason is, given early binding,
> > it it trivial to simulate something like late binding:
> >
> > def func(arg=None):
> >     if arg is None:
> >         arg = []
> >     ...
> >
> > but given late binding, it is ugly and inconvenient to get a poor
> > substitute for early binding when that's what you want. So, please,
> > let's not have a debate over the behaviour of function defaults.
> >
> > But what if we could have both? Suppose we use backticks `...` to make a
> > thunk, then we could write:
> >
> > def func(arg=`[]`):
> >     ...
> >
> > to get the late binding result wanted.
> >
> > Are there other uses for thunks? Potentially, they could be used for
> > Ruby-like code blocks:
> >
> > result = function(arg1, arg2, block=```# triple backticks
> >     do_this()
> >     do_that()
> >     while condition:
> >        do_something_else()
> >     print('Done')
> >     ```,
> >     another_arg=1)
> >
> >
> > but then I'm not really sure what advantage code blocks have over
> > functions.
> >
> >
> > > The inconsistency argument is simply that people will be even more
> > > confused than they are today if default arguments are evaluated at
> > > definition time while annotations aren't. There is a lot of code out
> > > there that actively relies on eager evaluation of default arguments,
> > > so changing that is out of the question, which then provides a strong
> > > consistency argument in favour of keeping annotations eagerly
> > > evaluated as well.
> >
> > Indeed. There are only (to my knowledge) only two places where Python
> > delays evaluation of code:
> >
> > - functions (def statements and lambda expressions);
> > - generator expressions;
> >
> > where the second can be considered to be syntactic sugar for a generator
> > function (def with yield). Have I missed anything?
> >
> > In the same way that Haskell is fundamentally built on lazy evaluation,
> > Python is fundamentally built on eager evaluation, and I don't think we
> > should change that.
> >
> > Until now, the only way to delay the evaluation of code (other than the
> > body of a function, of course) is to write it as a string, then pass it
> > to eval/exec. Thunks offer an alternative for delayed evaluation that
> > makes it easier for editors to apply syntax highlighting: don't apply it
> > to ordinary strings, but do apply it to thunks.
> >
> > I must admit that I've loved the concept of thunks for years now, but
> > I'm still looking for the killer use-case for them, the one clear
> > justification for why Python should include them.
> >
> > - Late-bound function default arguments? Nice to have, but we already
> > have a perfectly serviceable way to get the equivalent behaviour.
> >
> > - Code blocks? Maybe a Ruby programmer can explain why they're so
> > important, but we have functions, including lambda.
> >
> > - Function annotations? I'm not convinced thunks are needed or desirable
> > for annotations.
> >
> > - A better way to write code intended for delayed execution? Sounds
> > interesting, but not critical.
> >
> > Maybe somebody else can think of the elusive killer use-case for thunks,
> > because I've been pondering this question for many years now and I'm no
> > closer to an answer.
>
> Well, there's a use-case I have been pondering for a long while now
> which could be satisfied by this: enumerated generator displays.
>
> So suppose you have a composite boolean value, composed by the 'and' of
> many conditions (which all take long to compute), and you want to
> short-circuit. Let's take the following example.
>
>     valid = True
>     valid &= looks_like_emailaddress(username)
>     valid &= more_than_8_characters(password)
>     valid &= does_not_exist_in_database(username)
>     valid &= domain_name_of_emailaddress_has_mx_record(username)
>     ... some more options ...
>
> (I forgot the exact use-case, but I still remember the functionality I
> wanted, so bear with me).
>
> Of course, the above is not short-circuiting, so it would be replaced by
>
>    def check_valid(username, password):
>        if not looks_like_emailaddress(username): return False
>        if not more_than_8_characters(password): return False
>        if not does_not_exist_in_database(username): return False
>        if not domain_name_of_emailaddress_has_mx_record(username): return
> False
>        ...
>        return True
>
>
>     valid = check_valid()
>
> or
>
>     valid = True\
>         and looks_like_emailaddress(username)\
>         and more_than_8_characters(password)\
>         and does_not_exist_in_database(username)\
>         and domain_name_of_emailaddress_has_mx_record(username)
>
> But in all reality, I want to write something like:
>
>     valid = all(@@@
>         looks_like_emailaddress(username),
>         more_than_8_characters(password),
>         does_not_exist_in_database(username),
>         domain_name_of_emailaddress_has_mx_record(username),
>     @@@)
>
> With `@@@` designating the beginning/ending of the enumerated generator
> display.
>
> Now, this is currently not possible, but if we had some kind of thunk
> syntax that would become possible, without needing an enumerated
> generator display.
>
> However the problem I see with the concept of `thunk` is: When does it
> get un-thunked? In which of the following cases?
>
> 1. When getting an attribute on it?
> 2. When calling it? --> See 1. with `__call__`.
> 3. When subindexing it? --> See 1. with `__getitem__`.
> 4. When assigning it to a name? It shouldn't have to be un-thunked, I
>    think.
> 5. When adding it to a list? No un-thunking should be necessary, I
>    think.
>
> However, the problem with thunks is (I think) that to make that happen
> either
>
> - *all* objects need to include yet another level of redirection,
> or
> - a thunk needs to get allocated the maximum size of the value it could
>   possibly store. (But a `unicode` object could have an arbitrary size)
> or
> - there needs to be some way to 'notify' objects holding the thunk that
>   its value got updated.  For a dict/list/tuple this could readily grow
>   into O(n) behaviour when un-thunking a thunk.
> or
> - any C-level functionality needs to learn how to deal with thunks. For
>   instance, `Py_TYPE` would have to *resolve* the thunk, and then return
>   the type of the value.
> or
> - I'm running out of ideas here, but maybe creating a custom type object
>   for each thunk that does pass-through to a wrapped item? Thunked
>   objects would work *exactly* the same as normal objects, but at a
>   (small) indirection for any action taken. Still, somehow `Py_TYPE` and
>   `Py_SIZE` and any other macros would still have to force evaluation.
>
> Kind regards,
> Sjoerd Job
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160926/4eb9a070/attachment.html>