Question about idioms for clearing a list

Raymond Hettinger python at rcn.com
Fri Feb 10 03:53:23 EST 2006


[Alex Martelli]
> I was thinking of something different again, from a use case I did have:
>
> def buncher(sourceit, sentinel, container, adder, clearer):
>     for item in sourceit:
>         if item == sentinel:
>             yield container
>             clearer()
>         else
>             adder(item)
>     yield container
>
> s = set()
> for setbunch in buncher(src, '', s, s.add, s.clear): ...

I'm curious, what is the purpose of emptying and clearing the same
container?   ISTM that the for-loop's setbunch assignment would then be
irrelevant since id(setbunch)==id(s).  IOW, the generator return
mechanism is not being used at all (as the yielded value is constant
and known in advance to be identical to s).

Just for jollies, I experimented with other ways to do the same thing:

    from itertools import chain, groupby

    def buncher(sourceit, sentinel, container, updater, clearer):
        # Variant 1:  use iter() to do sentinel detection and
        # use updater() for fast, high volume updates/extensions
        it = iter(src)
        for item in it:
            updater(chain([item], iter(it.next, sentinel)))
            yield container
            clearer()

    s = set()
    for setbunch in buncher(src, '', s, s.update, s.clear):
        print setbunch, id(setbunch)

    def buncher(sourceit, sentinel, container, updater, clearer):
        # Variant 2:  use groupby() to the bunching and
        # use updater() for fast, high volume updates/extensions
        for k, g in groupby(sourceit, lambda x: x != sentinel):
            if k:
                updater(g)
                yield container
                clearer()

    s = set()
    for setbunch in buncher(src, '', s, s.update, s.clear):
        print setbunch

Of course, if you give-up the seemingly unimportant in-place update
requirement, then all three versions get simpler to implement and call:

    def buncher(sourceit, sentinel, constructor):
        # Variant 3:  return a new collection for each bunch
        for k, g in groupby(sourceit, lambda x: x != sentinel):
            if k:
                yield constructor(g)

    for setbunch in buncher(src, '', set):
        print setbunch

Voila, the API is much simpler; there's no need initially create the
destination container; and there's no need for adaptation functions
because the constructor API's are polymorphic:

    constructor = list
    constructor = set
    constructor = dict.fromkeys



[Alex]
> d = dict()
> for dictbunch in buncher(src, '', d, lambda x: d.setdefault(x,''),
>                                     d.clear): ...
>
> L = list()
> for listbunch in buncher(src, '', L, L.append,
>                             lambda: L.__setslice__(0,len(L),[])): ...

Hmm, is your original buncher a candidate for adapters?  For instance,
could the buncher try to adapt any collection input to support its
required API of generic adds, clears, updates, etc.?



[Alex]
> So what is the rationale for having list SO much harder to use in such a
> way, than either set or collections.deque?

Sounds like a loaded question ;-)

If you're asking why list's don't have a clear() method, the answer is
that they already had two ways to do it (slice assignment and slice
deletion) and Guido must have valued API compactness over collection
polymorphism.  The latter is also evidenced by set.add() vs
list.append() and by the two pop() methods having a different
signatures.

If you're asking why your specific case looked so painful, I suspect
that it only looked hard because the adaptation was force-fit into a
lambda (the del-statement or slice assignment won't work as an
expression).  You would have had similar difficulties embedding
try/except logic or a print-statement.  Guido, would of course
recommend using a plain def-statement:

    L = list()
    def L_clearer(L=L):
        del L[:]
    for listbunch in buncher(src, '', L, L.append, L_clearer):
        print listbunch



While I question why in-place updating was needed in your example, it
did serve as a nice way to show-off various approaches to adapting
non-polymorphic API's for a generic consumer function with specific
needs.

Nice post,


Raymond




More information about the Python-list mailing list