[Python-3000] Wild idea: Deferred Evaluation & Implicit Lambda
Talin
talin at acm.org
Tue May 30 12:13:07 CEST 2006
I want to take a stab at unifying a number of ideas that have been
proposed recently:
-- defaultdict - the desire to allow for a 'default' value that
doesn't get evaluated unless its needed.
-- the 'shortcut' operators 'and' and 'or', which aren't overloadable
because the current overloading mechanism won't handle the
current shortcut behavior.
-- various proposals with generators, list comprehensions, lambda,
etc.
Lets take the first case, defaultdict. The original motivation was to be
able to do something like:
a.setdefault( key, [] )
...but avoid the evaluation of the list creation unless it was actually
needed. The (eventual) proposed solution was to create a special
subclass of dict that would have a factory attribute for creating values
as needed.
However, I can think of a number of use cases which defaultdict doesn't
address. For example - suppose I have two dictionaries that I wish to
search, such that if I don't find a value in the first, I want to look
in the second; And if I don't find a value in the second, I want to
default to zero. (This pattern happens to me rather a lot - because I
tend to do a lot of stuff with parsing and nested symbol tables.)
One simple but inefficient way to code this would be:
value = a.getitem( key, b.getitem( key, 0 ) )
The problem here, of course, is that the second lookup gets performed
regardless of whether it is needed. Instead, you have to code something
like this:
if hasattr( a, key ):
value = a.getitem( key )
else:
value = b.getitem( key, 0 )
or even:
try:
value = a[ key ]
except KeyError:
value = b.getitem( key, 0 )
Now, suppose that getitem() expected the second argument to be a factory
function instead of an actual value, in which case you'd write it like this:
value = a.getitem( key, lambda: b.getitem( key, 0 ) )
The problem with this, of course, is that you now have to have a special
version of "getitem" which expects a callable instead of a value.
What if, instead, there was a way to mark a calculation as "lazy",
meaning that instead of actually doing the calculation, it would instead
insert a generator-like object which would produce the value the first
time it is accessed. Lets assume for a moment that they keyword 'lazy'
is used to indicate this situation (more likely it would be an operator,
but for simplicity I'll use a keyword):
value = a.getitem( key, lazy b.getitem( key, 0 ) )
So 'lazy' is kind of like an implicit lambda - that is, it acts like a
parameterless lambda that auto-evaluates itself at the right time.
Similarly, the 'setdefault' case becomes:
a.setdefault( key, lazy [] ).append( item )
In other words, the creation of the list isn't actually evaluated unless
it is needed.
Now, from an implementation standpoint, the implementation of a
generator-like construction isn't terribly hard; The difficult problem
is invoking the generator upon first use. There's no explicit protocol
that signals a value that it is being "used" as a function argument. You
generally can't write a class that somehow 'detects' whether or not a
function is retrieved from the function's argument list.
One way around that would be to have the 'lazy' attribute associated
with the formal parameter of the called function instead:
def getitem( self, key, lazy default_val ):
...
Unfortunately, this breaks the "no programmable syntax" rule - because
there's no way for a given program to know whether or not the function
being called has a lazy argument or not. I don't know of a way around
this, to be honest, so this idea won't work unless someone out there has
some clever idea that I haven't thought of.
However, it is still interesting to consider what the ramifications of
such a feature might be. For one thing, it makes the implementation of
__and__ and __or__ shortcut operators trivial:
def __and__( self, lazy second ):
...
def __or__( self, lazy second ):
...
And in fact, you could even do LISP-ish like statements such as 'cond':
def cond( self, test, lazy result1, lazy result2 ):
...
...or any other non-looping control construct (switch, etc.). (I say
non-looping because I assume that lazy values aren't re-evaluable like
callables, i.e. the lazy generator is replaced with its output on first
use.) This gets into the realm of 'evil' hidden control-flow-modifiers
and such.
Anyway, I realize that this is not doable (not without drastically
changing the nature of Python), but it's still interesting to think about.
-- Talin
More information about the Python-3000
mailing list