[Python-Dev] iterators

M.-A. Lemburg mal@lemburg.com
Mon, 21 Aug 2000 12:04:04 +0200


This is a multi-part message in MIME format.
--------------D181A42EF954A5F1E101C9DB
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Guido van Rossum wrote:
> 
> Paul Prescod wrote:
> 
> > I don't think of iterators as indexing in terms of numbers. Otherwise I
> > could do this:
> >
> > >>> a={0:"zero",1:"one",2:"two",3:"three"}
> > >>> for i in a:
> > ...     print i
> > ...
> >
> > So from a Python user's point of view, for-looping has nothing to do
> > with integers. From a Python class/module creator's point of view it
> > does have to do with integers. I wouldn't be either surprised nor
> > disappointed if that changed one day.
> 
> Bingo!
> 
> I've long had an idea for generalizing 'for' loops using iterators. This is
> more a Python 3000 thing, but I'll explain it here anyway because I think
> it's relevant. Perhaps this should become a PEP?  (Maybe we should have a
> series of PEPs with numbers in the 3000 range for Py3k ideas?)
> 
> The statement
> 
>   for <variable> in <object>: <block>
> 
> should translate into this kind of pseudo-code:
> 
>   # variant 1
>   __temp = <object>.newiterator()
>   while 1:
>       try: <variable> = __temp.next()
>       except ExhaustedIterator: break
>       <block>
> 
> or perhaps (to avoid the relatively expensive exception handling):
> 
>   # variant 2
>   __temp = <object>.newiterator()
>   while 1:
>       __flag, <variable = __temp.next()
>       if not __flag: break
>       <block>
> 
> In variant 1, the next() method returns the next object or raises
> ExhaustedIterator. In variant 2, the next() method returns a tuple (<flag>,
> <variable>) where <flag> is 1 to indicate that <value> is valid or 0 to
> indicate that there are no more items available. I'm not crazy about the
> exception, but I'm even less crazy about the more complex next() return
> value (careful observers may have noticed that I'm rarely crazy about flag
> variables :-).
> 
> Another argument for variant 1 is that variant 2 changes what <variable> is
> after the loop is exhausted, compared to current practice: currently, it
> keeps the last valid value assigned to it. Most likely, the next() method
> returns None when the sequence is exhausted. It doesn't make a lot of sense
> to require it to return the last item of the sequence -- there may not *be*
> a last item, if the sequence is empty, and not all sequences make it
> convenient to keep hanging on to the last item in the iterator, so it's best
> to specify that next() returns (0, None) when the sequence is exhausted.
> 
> (It would be tempting to suggeste a variant 1a where instead of raising an
> exception, next() returns None when the sequence is exhausted, but this
> won't fly: you couldn't iterate over a list containing some items that are
> None.)

How about a third variant:

#3:
__iter = <object>.iterator()
while __iter:
   <variable> = __iter.next()
   <block>

This adds a slot call, but removes the malloc overhead introduced
by returning a tuple for every iteration (which is likely to be
a performance problem).

Another possibility would be using an iterator attribute
to get at the variable:

#4:
__iter = <object>.iterator()
while 1:
   if not __iter.next():
        break
   <variable> = __iter.value
   <block>

> Side note: I believe that the iterator approach could actually *speed up*
> iteration over lists compared to today's code. This is because currently the
> interation index is a Python integer object that is stored on the stack.
> This means an integer add with overflow check, allocation, and deallocation
> on each iteration! But the iterator for lists (and other basic sequences)
> could easily store the index as a C int! (As long as the sequence's length
> is stored in an int, the index can't overflow.)

You might want to check out the counterobject.c approach I used
to speed up the current for-loop in Python 1.5's ceval.c:
it's basically a mutable C integer which is used instead of
the current Python integer index.

The details can be found in my old patch:

  http://starship.python.net/crew/lemburg/mxPython-1.5.patch.gz

> [Warning: thinking aloud ahead!]
> 
> Once we have the concept of iterators, we could support explicit use of them
> as well. E.g. we could use a variant of the for statement to iterate over an
> existing iterator:
> 
>   for <variable> over <iterator>: <block>
> 
> which would (assuming variant 1 above) translate to:
> 
>   while 1:
>       try: <variable> = <iterator>.next()
>       except ExhaustedIterator: break
>       <block>
> 
> This could be used in situations where you have a loop iterating over the
> first half of a sequence and a second loop that iterates over the remaining
> items:
> 
>   it = something.newiterator()
>   for x over it:
>       if time_to_start_second_loop(): break
>       do_something()
>   for x over it:
>       do_something_else()
> 
> Note that the second for loop doesn't reset the iterator -- it just picks up
> where the first one left off! (Detail: the x that caused the break in the
> first loop doesn't get dealt with in the second loop.)
> 
> I like the iterator concept because it allows us to do things lazily. There
> are lots of other possibilities for iterators. E.g. mappings could have
> several iterator variants to loop over keys, values, or both, in sorted or
> hash-table order. Sequences could have an iterator for traversing them
> backwards, and a few other ones for iterating over their index set (cf.
> indices()) and over (index, value) tuples (cf. irange()). Files could be
> their own iterator where the iterator is almost the same as readline()
> except it raises ExhaustedIterator at EOF instead of returning "".  A tree
> datastructure class could have an associated iterator class that maintains a
> "finger" into the tree.
> 
> Hm, perhaps iterators could be their own iterator? Then if 'it' were an
> iterator, it.newiterator() would return a reference to itself (not a copy).
> Then we wouldn't even need the 'over' alternative syntax. Maybe the method
> should be called iterator() then, not newiterator(), to avoid suggesting
> anything about the newness of the returned iterator.
> 
> Other ideas:
> 
> - Iterators could have a backup() method that moves the index back (or
> raises an exception if this feature is not supported, e.g. when reading data
> from a pipe).
> 
> - Iterators over certain sequences might support operations on the
> underlying sequence at the current position of the iterator, so that you
> could iterate over a sequence and occasionally insert or delete an item (or
> a slice).

FYI, I've attached a module which I've been using a while for
iteration. The code is very simple and implements the #4 variant
described above.
 
> Of course iterators also connect to generators. The basic list iterator
> doesn't need coroutines or anything, it can be done as follows:
> 
>   class Iterator:
>       def __init__(self, seq):
>           self.seq = seq
>           self.ind = 0
>       def next(self):
>           if self.ind >= len(self.seq): raise ExhaustedIterator
>           val = self.seq[self.ind]
>           self.ind += 1
>           return val
> 
> so that <list>.iterator() could just return Iterator(<list>) -- at least
> conceptually.
> 
> But for other data structures the amount of state needed might be
> cumbersome. E.g. a tree iterator needs to maintain a stack, and it's much
> easier to code that using a recursive Icon-style generator than by using an
> explicit stack. On the other hand, I remember reading an article a while ago
> (in Dr. Dobbs?) by someone who argued (in a C++ context) that such recursive
> solutions are very inefficient, and that an explicit stack (1) is really not
> that hard to code, and (2) gives much more control over the memory and time
> consumption of the code. On the third hand, some forms of iteration really
> *are* expressed much more clearly using recursion. On the fourth hand, I
> disagree with Matthias ("Dr. Scheme") Felleisen about recursion as the root
> of all iteration. Anyway, I believe that iterators (as explained above) can
> be useful even if we don't have generators (in the Icon sense, which I
> believe means coroutine-style).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
--------------D181A42EF954A5F1E101C9DB
Content-Type: text/python; charset=us-ascii;
 name="Iterator.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="Iterator.py"

""" Generic object iterators.

    (c) Copyright Marc-Andre Lemburg; All Rights Reserved.
    See the documentation for further information on copyrights,
    or contact the author (mal@lemburg.com).

"""
import exceptions

class Error(exceptions.StandardError):
    pass

class Iterator:

    value = None                # Current value

    def __init__(self,obj,startpos=0):

        self.obj = obj
        self.index = startpos
        self.startpos = startpos
        self.step = 1

    def set(self,pos):

        self.index = self.startpos = pos 
        self.value = self.obj[pos]

    def reset(self):

        self.index = pos = self.startpos
        self.value = self.obj[pos]

    def next(self):

        self.index = i = self.index + self.step
        if i < 0:
            return 0
        try:
            self.value = self.obj[i]
            return 1
        except IndexError:
            return 0

    def prev(self):

        self.index = i = self.index - self.step
        if i < 0:
            return 0
        try:
            self.value = self.obj[i]
            return 1
        except IndexError:
            return 0

    def __getitem__(self,i):

        if i != 0:
            self.index = i = self.index + self.step
        else:
            i = self.index
        if i < 0:
            raise IndexError
        self.value = v = self.obj[i]
        return v

ForwardIterator = Iterator

class BackwardIterator(Iterator):

    def __init__(self,obj,startpos=None):

        self.obj = obj
        if startpos is not None:
            self.index = startpos
        else:
            self.index = startpos = len(obj) - 1
        self.startpos = startpos
        self.value = self.obj[startpos]
        self.step = -1

class CallIterator(Iterator):

    def __init__(self,obj):

        self.obj = obj
        self.index = 0

    def set(self,pos):

        raise Error,'.set() not supported'

    def reset(self):

        raise Error,'.reset() not supported'

    def next(self):

        self.index = self.index + 1
        try:
            v = self.obj()
            if not v:
                return 0
            self.value = v
            return 1
        except IndexError:
            return 0

    def prev(self):

        raise Error,'.prev() not supported'

    def __getitem__(self,i):

        self.index = self.index + 1
        v = self.obj()
        if not v:
            raise IndexError
        self.value = v
        return v


def _test():
    i = BackwardIterator(range(1,10))
    for x in i:
        print x
    print
    i.reset()
    while 1:
        print i.value
        if not i.next():
            break
    print
    filereader = CallIterator(open('/usr/dict/words').readline)
    for line in filereader:
        pass
    print 'Read %i lines' % filereader.index

if __name__ == '__main__':
    _test()

--------------D181A42EF954A5F1E101C9DB--