[Python-Dev] RE: cloning iterators again

Raymond Hettinger python at rcn.com
Mon Oct 27 06:24:57 EST 2003


> I also note that the current tee() doesn't let you use __copy__ easily
> (it would be quite messy I think).  The linked-list version supports
> __copy__ trivially.  This may be important if we execute (as I think
> we should) on the idea of making selected iterators __copy__-able
> (especially all the standard container iterators and xrange).

The current tee() was written to support only a two way split, but it
can easily be cast as a multi-way splitter with no problem.  

The only real difference in the ideas presented so far are whether the
underlying queue should be implemented as a singly linked list or as a
double stack.

As a proof-of-concept, here is GvR's code re-cast with the queue changed
to a double stack implementation.  The interface is completely
unchanged.  The memory consumed is double that the current tee() but
much less than the linked list version.  The speed is half that of the
current tee() and roughly comparable to or slightly better than the
linked list version.


Raymond Hettinger


------------------------------------------------------------------------
--

""" Guido's demo program re-cast with a different underlying data
structure

Replaces the linked list based queue with a two stack based queue.

Advantages:

   The double stack method consumes only two pointers per data element
   while the linked list method consumes space for a link object
   (8 to 10 words).

   The double stack method uses contiguous memory while the link
   objects are more fragmented.

   The stack method uses append() and pop() which are optimized to
   minimize memory management calls.  For the link method, every 
   link costs a malloc and free.

Todo:
    Handle Wrappers that are GC'd before termination.
    Add support for holding an exception.

"""

class TeeMaster(object):
    """Holder for information common to wrapped iterators

    """

    def __init__(self, it):
        self.inbasket = []
        self.inrefcnt = []
        self.outbasket = []
        self.outrefcnt = []
        self.numseen = 0
        self.it = it
        self.numsplits = 0

class Wrapper(object):
    """Copyable wrapper around an iterator.

    Any number of Wrappers around the same iterator share the TeeMaster
    object.  The Wrapper that is most behind will drop the refcnt to
    zero, which causes the reference to be popped off of the queue.

    The newest Wrapper gets a brand new TeeMaster object.  Later
    wrappers share an existing TeeMaster object.  Since they may
    have arrived late in the game, they need to know how many objects
    have already been seen by the wrapper.  When they call next(),
    they ask for the next numseen.

    If a Wrapper is garbage-collected before it finishes, the refcount
    floor needs to be raised.  That has not yet been implemented.

    """

    __slots__ = ["master", "numseen"]

    def __init__(self, it, master=None):
        """Constructor.  The master argument is used by __copy__
below."""
        if master is None:
            master = TeeMaster(it)
        self.master = master
        self.numseen = master.numseen
        self.master.numsplits += 1

    def __copy__(self):
        """Copy the iterator.

        This returns a new iterator that will return the same series
        of results as the original.

        """
        return Wrapper(None, self.master)

    def __iter__(self):
        """All iterators should support __iter__() returning self."""
        return self

    def next(self):
        """Get the next value of the iterator, or raise
StopIteration."""

        master = self.master
        inbasket, inrefcnt = master.inbasket, master.inrefcnt

        if master.numseen == self.numseen:
            # This is the lead dog so get a value through the iterator
            value = master.it.next()
            master.numseen += 1            
            # Save it for the other dogs
            inbasket.append(value)
            inrefcnt.append(master.numsplits-1)
            self.numseen += 1
            return value

        # Not a lead dog -- the view never changes :-(

        location = len(inbasket) - (master.numseen - self.numseen)

        if location >= 0:
            # Our food is in the inbasket
            value = inbasket[location]
            inrefcnt[location] -= 1
            rc = inrefcnt[location]
        else:
            # Our food is in the outbasket
            location = -(location + 1)
            value = master.outbasket[location]
            master.outrefcnt[location] -= 1
            rc = master.outrefcnt[location]
            
        # Purge doggie bowl when no food is left
        if rc == 0:
            if len(master.outbasket) == 0:
                master.outbasket, master.inbasket = master.inbasket,
master.outbasket
                master.outrefcnt, master.inrefcnt = master.inrefcnt,
master.outrefcnt
                master.outbasket.reverse()
                master.outrefcnt.reverse()
            master.outbasket.pop()
            master.outrefcnt.pop()

        self.numseen += 1
        return value

def tee(it):
    """Replacement for Raymond's tee(); see examples in itertools
docs."""
    if not hasattr(it, "__copy__"):
        it = Wrapper(it)
    return (it, it.__copy__())

def test():
    """A simple demonstration of the Wrapper class."""
    import random
    def gen():
        for i in range(10):
            yield i
    it = gen()
    a, b = tee(it)
    b, c = tee(b)
    c, d = tee(c)
    iterators = [a, b, c, d]
    while iterators != [None, None, None, None]:
        i = random.randrange(4)
        it = iterators[i]
        if it is None:
            next = "----"
        else:
            try:
                next = it.next()
            except StopIteration:
                next = "****"
                iterators[i] = None
        print "%4d%s%4s%s" % (i, "   ."*i, next, "   ."*(3-i))

if __name__ == "__main__":
    test()




More information about the Python-Dev mailing list