[Python-Dev] cloning iterators again

Guido van Rossum guido at python.org
Sun Oct 26 16:16:31 EST 2003


The following is just so beautiful, I have to share it.

I've been thinking about various implementations of Andrew Koenig's
idea of "copyable iterator wrappers", which support a generalization
of Raymond Hettinger's tee().  This needs some kind of queue-ish data
structure, but all queues that I could think of required too much
administration for the additional requirements.  I finally realized
(this may come as no surprise to Andrew :-) that the best
implementation is a singly-linked list.  Yes, a use case for a linked
list in Python!  This nicely takes care of the GC issue when one of
the iterators is discarded before being exhausted.

I hope Raymond can implement this in C.

class Link(object):
    """Singly-linked list of (state, value) pairs.

    The slots are manipulated directly by Wrapper.next() below.

    The state slot can have three values, which determine the meaning
    of the value slot:

    state = 0   => value is not yet determined (set to None)
    state = 1   => value is value to be returned by next()
    state = -1  => value is exception to be raised by next()

    The next slot points to the next Link in the chain; it is None at
    the end of the chain (state <= 0).

    """

    __slots__ = ["state", "value", "next"]

    def __init__(self):
        self.state = 0
        self.value = None
        self.next = None

class Wrapper(object):
    """Copyable wrapper around an iterator.

    Any number of Wrappers around the same iterator share the same
    chain of Links.  The Wrapper that is most behind references the
    oldest Link, and as it moves forward the oldest Link instances are
    automatically discarded.

    The newest Link has its value set to None and its state set to 0.
    When a Wrapper needs to get the value out of this Link, it calls
    next() on the underlying iterator and stores it in the Link,
    setting its state to 1, for the benefit of other Wrappers that are
    behind it.  If the underlying iterator's next() raises an
    exception, the Link's state is set to -1 and its value to the
    exception instance instead.

    When the oldest Wrapper is garbage-collected before it finishes
    the chain, the Links that it owns are also garbage-collected, up
    to the next Link still owned by a live Wrapper.

    """

    __slots__ = ["it", "link"]

    def __init__(self, it, link=None):
        """Constructor.  The link argument is used by __copy__ below."""
        self.it = it
        if link is None:
            link = Link()
        self.link = link

    def __copy__(self):
        """Copy the iterator.

        This returns a new iterator that will return the same series
        of results as the original.

        """
        return Wrapper(self.it, self.link)

    def __iter__(self):
        """All iterators should support __iter__() returning self."""
        return self

    def next(self):
        """Get the next value of the iterator, or raise StopIteration."""
        link = self.link
        if link is None:
            raise StopIteration
        state, value, next = link.state, link.value, link.next
        if state == 0:
            # At the head of the chain: move underlying iterator
            try:
                value = self.it.next()
            except StopIteration, exc:
                value = exc
                state = -1
            else:
                state = 1
            link.state = state
            link.value = value
        if state < 0:
            self.link = None
            raise value
        assert state > 0
        if next is None:
            next = Link()
            link.next = next
        self.link = next
        return value

def tee(it):
    """Replacement for Raymond's tee(); see examples in itertools docs."""
    if not hasattr(it, "__copy__"):
        it = Wrapper(it)
    return (it, it.__copy__())

def test():
    """A simple demonstration of the Wrapper class."""
    import random
    def gen():
        for i in range(10):
            yield i
    it = gen()
    a, b = tee(it)
    b, c = tee(b)
    c, d = tee(c)
    iterators = [a, b, c, d]
    while iterators != [None, None, None, None]:
        i = random.randrange(4)
        it = iterators[i]
        if it is None:
            next = "----"
        else:
            try:
                next = it.next()
            except StopIteration:
                next = "****"
                iterators[i] = None
        print "%4d%s%4s%s" % (i, "   ."*i, next, "   ."*(3-i))

if __name__ == "__main__":
    test()

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list