[Python-Dev] accumulator display semantics

Alex Martelli aleaxit at yahoo.com
Thu Oct 16 06:00:04 EDT 2003


On Thursday 16 October 2003 07:04 am, Peter Norvig wrote:
> Yes, you're right -- with generator comprehensions, you can have
> short-circuit evaluation via functions on the result, and you can get
> at both original element and some function of it, at the cost of
> writing f(x), x.  So my proposal would be only a small amount of
> syntactic sugar over what you can do with generator comprehensions.

I _like_ your proposal, particularly in my proposed variant syntax
    foo[ x*x for x in xs if cond(x) ]
vs your original syntax
    [ foo: x*x for x in xs if cond(x) ]

I think the "indexing-like" syntax I proposed solves Greg's objection
that your "list display-like" syntax (and similar proposals for iterator
comprehensions) misleadingly suggest that a list is the result; an
indexing makes no such suggestion, as foo[bar] may just as well
be a sequence, an iterator, or anything else whatsoever depending
on foo (and perhaps on bar:-).

But syntax apart, let's dig a little bit more in the semantics.  At

http://www.norvig.com/pyacc.html

you basically propose that the infrastructure for an accumulator
display perform the equivalent of:

        for x in it:
            if a.add(f(x), x):
                break        
        return a.result()

where a, in Ian Bicking's proposal, would be acc.__accum__()
(I like this, as it lets us use existing sum, max, etc, as accumulators,
by adding suitable methods __accum__ to them).

However, this would not let accumulator displays usefully return
iterators -- since the entire for loop is done by the infrastructure,
the best a could do would be to store all needed intermediates
to return an iterator on them as a.result() -- possible memory waste.

My idea about this is still half-baked, but I think it's ready to post
and get your and others' feedback on.

Why not move the for loop, if needed, out of the hard-coded
infrastructure and just have accumulator display syntax such as:
    acc[x*x for x in it]
be exactly equivalent to:
    a = acc.__accum__(lambda x: x*x, iter(it))
    return a.result()
i.e., pass the callable corresponding to the expression, and the
iterator corresponding to the sequence, to the user-coded
accumulator.  Decisions would have to be taken regarding what
exactly to do when the display contains multiple for, if, and/or control
variables, as in
    acc[f(x,y,z) for x, y in it1 if y>x for z in g(y) if z<x+y]
and such nightmares; I'll assume in the following that any such
complicated display is conceptually brought back to the pristine
simplicity of
    acc[<expr>(x) for x in <it>]
where x can be a tuple of the multiple control variables involved
and iterable 'it' already encodes all nested-for's and if's into one
"stream" of values (some similar kind of decision will have to be
taken for your original suggestion, for iterator comprehensions,
and for any other such idea, it seems to me).

The advantage of my idea would be to let accumulator display
syntax just as easily return iterators.  E.g., with something like:

class Accum(object):
    def __accum__(cls, exp, it):
         " make __accum__ a classmethod equivalent to calling the class "
         return cls(exp, it)
    __accum__ = classmethod(__accum__)
    def __init__(self, exp, it):
        " factor-out the common case of looping into this base-class "
        for item in it:
            if self.add(exp(it), it):
                break
    def result(self):
        " let self.add implicitly accumulate into self._result by default "
        return self._result

class Iter(Accum):
    def __init__(self, exp, it):
        " overriding Accum.__init__ as we don't wanna loop "
        self.exp = exp
        self.it = it
    def result(self):
        " overriding Accum.result with a generator "
        for item in self.it: yield self.exp(item)

you could code e.g.
    for y in Iter[ x*x for x in nums if good(x)]:
        blahblah(y)
as being equivalent to:
    for x in nums:
        if good(x):
            y = x*x
            blahblah(y)

but you could also code, roughly as in your original proposal,

class Mean(Accum):
    def __init__(self, exp=None, it=()):
        " do self attribute initializations then chain up to base class "
        self.total, self.n = 0, 0
        Accum.__init__(self, exp, it)
    def add(self, value, _ignore):
        " the elementary step is unchanged "
        self.total, self.n = self.total+value, self.n+1
    def result(self):
        " override Accum.result as this is better computed just one "
        return self.total / self.n

to keep the .add method factored out for non-display use (the
default empty it argument to __init__ is there for this specific
purpose, too), if you wished.


Basically, my proposal amounts to a different factoring of accumulator
display functionality between Python's hard-coded infrastructure, and
functionality to be supplied by the standard library module accum that
you already propose.  By having much less in the hard-coded parts --
basically just the identification and passing-on of the proper expression
and iterator -- and correspondingly more in the standard library, we gain
flexibility because a base class in the library may be more flexibly
"overridden", in part or in its entirety (an accumulator doesn't HAVE to
inherit from class Accum at all, if it just wants to reimplement both
of the __accum__ and result methods on its own).  If this slows things
down a bit we may perhaps in the future hard-code some special cases,
but worrying about it now would feel like premature optimizaton to me.


Alex




More information about the Python-Dev mailing list