Function attributes: a bug?

Sat Mar 29 12:43:57 EST 2003

<posted & mailed>

Tj wrote:

> Hi,
> 
> I'm trying to make a closure using the following code.  It returns a
> function that can be used to keep adding stuff to a list.
> 
> def make_list_accum(init=[]):
>     def accum(elem):
>         accum.list.append(elem)
>         return accum.list
>     accum.list = init
>     return accum
> 
> But there's a bug -- if I generate a function using this, use it for a
> while to build up a list, then call make_list_accum() again, it will
> want to init itself with the other list by default, sharing it!  This
> is surprising behavior, and doesn't happen if I don't use this
> optional init form.
> 
> Any thoughts?

As I see that many have already answered you, the default values of
arguments are evaluated once -- this has nothing to do with function
attributes.  What I DO find surprising behavior is the complication
you have chosen to use in this closure.  What role does argument
'init' play -- why isn't the penultimare statement just
    accum.list = []
for example?  Is it part of the specs that a list argument can
optionally be passed to make_list_accum, while the absence of such
an argument silently implies a new empty list is to be used?  In 
this case, the advice I've seen you've already been given (make
the default value None, and generate the new empty list if needed)
fully applies.

But further, why do you need to single out the "list being appended
to" as accum.list?  Is there some spec that the current value of the
list must be accessible _without_ further accumulation?  If not,

def make_list_accum(initlist=None):
    if initlist is None: initlist = []
    def accum(elem):
        initlist.append(elem)
        return initlist
    return accum

If you do need to access the list, then I see why it makes sense
to add an "accum.list = initlist" statement right before the
"return accum", but it still seems better to use initlist in
the body of accum -- why take an extra indirection?  Or do your
specs require the ability to CHANGE the list being accumulated
too, as well?

Trying to reverse-engineer somewhat complicated specs from the
way a function has actually been coded (assuming that every
apparent complication in the function is entirely justified by
the requirements) is a somewhat unrewarding exercise, IMHO.

> BTW, I want to send something like this to Paul Graham for his closure
> shootout.  Is this better than the normal version which uses objects?

I think it depends on what axis you want to judge "better" along.
Closures are simpler than class instances when your needs are
quite well-limited.  But if you need your solution to do a lot of
things there comes a time in which stretching closures' modest
abilities becomes (IMHO) too complicated (and that is well before
it becomes impossible).  To judge in a given case, one needs but
code things out and perhaps apply timeit.py (from Python 2.3, but
you can get it and use it with 2.2 as well) to some benchmarks that
are representative of code one wants to run.

Let's suppose, for example, that you only need some subset of
the functionality in your initial solution -- I'm guessing you
do need the ability to make an accumulator from either an
existing list or (by default) a new empty list, and to access
the list being accumulated to, both with and without appending
further to it, but don't need to rebind said list for one
given accumulator object.

Suppose further that typical usage parameters might be:
    -- generate N accumulators with existing lists
       and N more with lists that are default, new, empty ones;
    -- for each accumulator, call it 100 times for appending,
    -- and access its list-being-accessed another 100 times
if your typical operations would use different parameters,
of course, you can juggle the numbers.

Now, I would code a benchmark script be.py:

def clos(initlist=None):
    if initlist is None: initlist = []
    def accum(elem):
        initlist.append(elem)
        return initlist
    accum.thelist = initlist
    return accum

class clas(object):
    __slots__ = ['thelist']
    def __init__(self, initlist=None):
        if initlist is None: initlist = []
        self.thelist = initlist
    def __call__(self, elem):
        self.thelist.append(elem)
        return self.thelist

# we can see that, at these levels of functionality, the
# complication of the two approaches is comparable!

Nexist = Nnew = 100
nappend = 100
naccess = 100

def work(maker):
    for i in xrange(Nexist):
        accum = maker(range(3))
        for j in xrange(nappend): accum(j)
        for j in xrange(naccess): accum.thelist
    for i in xrange(Nnew):
        accum = maker()
        for j in xrange(nappend): accum(j)
        for j in xrange(naccess): accum.thelist

and finally use timeit.py on it:

[alex at lancelot alex]$ python timeit.py -s 'import be' 'be.work(be.clas)'
10 loops, best of 3: 1.26e+05 usec per loop
[alex at lancelot alex]$ python timeit.py -s 'import be' 'be.work(be.clos)'
10 loops, best of 3: 8.08e+04 usec per loop
[alex at lancelot alex]$

with a time consumption that's only about 0.64 than the class-based
solution, the closure-based solution can be said to be a bit "better"
for these benchmarking parameters, on this machine, for this version
of Python.  Using parameters that do a better job of representing
your actual application's needs will of course yield answers that are
more relevant for your specific case.  But I think you'll find that
this kind of speedup is roughly representative of many similar cases.

Alex