Iterator / Iteratable confusion

Francis Girard francis.girard at free.fr
Sun Feb 13 13:58:27 EST 2005


Hi,

I wrote simple dummy examples to teach to myself iterators and generators.

I think that the iteration protocol brings a very neat confusion between the 
iterator and what it iterates upon (i.e. the iteratable). This is outlined in 
"example 3" in my dummy examples.

What are your feelings about it ? 

Regards

Francis Girard

==== BEGINNING OF EXAMPLES ====

from itertools import tee, imap, izip, islice
import sys
import traceback

sEx1Doc = \
"""
================================================================================
Example 1:
================================================================================

An ""iteratable"" class is a class supporting the __iter__ method which should
return an ""iterator"" instance, that is, an instance of a class supporting
the "next" method. 

An iteratable, strictly speaking, doesn't have to support the "next" method 
and 
an "iterator" doesn't have to support the "__iter__" method (but this breaks 
the
iteration protocol as we will later see).

The ""for ... in ..."" construct now expect an iteratable instance in its 
second argument place. It first invoke its __iter__ method and then repeatedly
invoke the ""next"" method on the object returned by ""__iter__"" until the
""StopIteration"" exception is raised.

The designing goal is to cleanly separate a container class with the way we
iterate over its internal elements.
"""
class Ex1_IteratableContClass:
  def __init__(self):
    print "Ex1_IteratableContClass.__init__"
    self._vn = range(0,10)
    
  def elAt(self, nIdx):
    print "Ex1_IteratableContClass.elAt"
    return self._vn[nIdx]
    
  def __iter__(self):
    print "Ex1_IteratableContClass.__iter__"
    return Ex1_IteratorContClass(self)

class Ex1_IteratorContClass:
  def __init__(self, cont):
    print "Ex1_IteratorContClass.__init__"
    self._cont = cont
    self._nIdx = -1
    
  def next(self):
    print "Ex1_IteratorContClass.next"
    self._nIdx += 1
    try:
      return self._cont.elAt(self._nIdx)
    except IndexError:
      raise StopIteration

print
print sEx1Doc
print
for n in Ex1_IteratableContClass():
  print n,
  sys.stdout.flush()
  

sEx2Doc = \
"""
================================================================================
Example 2:
================================================================================

Let's say that we want to give two ways to iterate over the elements of our 
example container class. The default way is to iterate in direct order and we
want to add the possibility to iterate in reverse order. We simply add another
""iterator"" class. We do not want to modify the container class as, 
precisely,
the goal is to decouple the container from iteration.
"""
class Ex2_IteratableContClass:
  def __init__(self):
    print "Ex2_IteratableContClass.__init__"
    self._vn = range(0,10)
    
  def elAt(self, nIdx):
    print "Ex2_IteratableContClass.elAt"
    return self._vn[nIdx]
    
  def __iter__(self):
    print "Ex2_IteratableContClass.__iter__"
    return Ex1_IteratorContClass(self)

class Ex2_RevIteratorContClass:
  def __init__(self, cont):
    print "Ex2_RevIteratorContClass.__init__"
    self._cont = cont
    self._nIdx = 0
    
  def next(self):
    print "Ex2_RevIteratorContClass.next"
    self._nIdx -= 1
    try:
      return self._cont.elAt(self._nIdx)
    except IndexError:
      raise StopIteration

print
print sEx2Doc
print
print "Default iteration works as before"
print
for n in Ex2_IteratableContClass():
  print n,
  sys.stdout.flush()

print
print "But reverse iterator fails with an error : "
print

cont = Ex2_IteratableContClass()
try:
  for n in Ex2_RevIteratorContClass(cont):
    print n,
    sys.stdout.flush()
except:
  traceback.print_exc()

sEx3Doc = \
"""
================================================================================
Example 3.
================================================================================

The previous example fails with the "iteration over non sequence" error 
because
the "Ex2_RevIteratorContClass" iterator class doesn't support the __iter__
method. We therefore have to supply one, even at the price of only returning
""self"".

I think this is ugly because it baffles the distinction between iterators and
iteratables and we artificially have to add an ""__iter__"" method to the 
iterator itself, which should return ... well, an iterator, which it already 
is !

I presume that the rationale behind this is to keep the feature that the 
second
argument place of the ""for ... in ..."" should be filled by a container (i.e.
an iteratable), not an iterator.

Another way that Python might have done this without breaking existing code is 
to make the ""for ... in ..."" construct invoke the __iter__ method if it 
exists, otherwise, directly call the ""next"" method on the supplied object.

But this is only a minor quirk as the decoupling of the iterator from the 
iteratable is nonetheless achieved at the (small) price of adding an "almost"
useless method.

So here it is.
"""

class Ex3_RevIteratorContClass:
  def __init__(self, cont):
    print "Ex3_RevIteratorContClass.__init__"
    self._cont = cont
    self._nIdx = 0
    
  def __iter__(self):
    print "Ex3_RevIteratorContClass.__iter__"
    return self ## zut !
    
  def next(self):
    print "Ex3_RevIteratorContClass.next"
    self._nIdx -= 1
    try:
      return self._cont.elAt(self._nIdx)
    except IndexError:
      raise StopIteration

print
print sEx3Doc
print
cont = Ex2_IteratableContClass()
for n in Ex3_RevIteratorContClass(cont):
  print n,
  sys.stdout.flush()

sEx4Doc = \
"""
================================================================================
Example 4.
================================================================================

Since version 2.2, a new built-in function is offered that simply takes an 
iteratable as argument and returns an iterator by calling its "__iter__". As
long as only iteratables/iterators are involved, this is no big deal as
""iter(anIteratable)"" is strictly equivalent to ""anIteratable.__iter__()"".

The real advantage in using this function is that is also accepts containers
supporting the sequence protocol (the __getitem__() method with integer 
arguments starting at 0). The ""iter"" methods also returns an iterator for 
this
case. We can therefore write a function accepting a container as argument 
that,
with the same code, can either support the iteration protocol or the sequence
protocol.
"""

class Ex4_IteratableContClass:
  def __init__(self):
    print "Ex4_IteratableContClass.__init__"
    self._vn = range(0,10)
    
  def elAt(self, nIdx):
    print "Ex4_IteratableContClass.elAt"
    return self._vn[nIdx]
    
  def __iter__(self):
    print "Ex4_IteratableContClass.__iter__"
    return Ex4_IteratorContClass(self)

class Ex4_IteratorContClass:
  def __init__(self, cont):
    print "Ex4_IteratorContClass.__init__"
    self._cont = cont
    self._nIdx = -1
    
  def __iter__(self):
    print "Ex4_RevIteratorContClass.__iter__"
    return self ## zut !
    
  def next(self):
    print "Ex4_IteratorContClass.next"
    self._nIdx += 1
    try:
      return self._cont.elAt(self._nIdx)
    except IndexError:
      raise StopIteration

class Ex4_SequenceContClass:
  def __init__(self):
    print "Ex4_SequenceContClass.__init__"
    self._vn = range(0,10)
    
  def elAt(self, nIdx):
    print "Ex4_SequenceContClass.elAt"
    return self._vn[nIdx]
    
  def __getitem__(self, key):
    print "Ex4_IteratableContClass.__getitem__"
    return self.elAt(key)

def Ex4_FuncPrintContainerEl(cont):
  for n in iter(cont):
    print n,
    sys.stdout.flush()

print
print sEx4Doc
print
print
print "Applying Ex4_FuncPrintContainerEl to an iteratable container"
print
Ex4_FuncPrintContainerEl(Ex4_IteratableContClass())

print
print "Applying Ex4_FuncPrintContainerEl to a sequence container"
print
Ex4_FuncPrintContainerEl(Ex4_SequenceContClass())


sEx5Doc = \
"""
================================================================================
Example 5. 
================================================================================

Introducing Generators

The "generator" term itself is frequently used with two different meanings.
It is sometimes used to mean "generator function" and sometimes to mean
"generator-iterator". The PEP255 and python mailing list are the two only 
places
where I could read something that clearly distinguishes the two notions.

A generator-function can be first seen as only a notational artefact to 
generate
an iterator. The generated iterator is called a generator-iterator and it is,
by its intimate nature, both an iterator and an iteratable. (This might very
well be another reason why the iteration protocol baffles the distinction
betweeen iterators and iteratables.) Typically (but not necessarily), the
generator-iterator is such that the elements do not pre-exist but are 
"generated" as needed while iteration is going on.

Here is an example of a generator-function and the equivalent iterator class.
The function ""Ex5_GenList"" can be seen as just a notational shortcut to the 
equivalent Ex5_GenListIteratorClass class.
"""

def Ex5_GenList():
  print "Enters Ex5_GenList"
  i = 0
  while i < 10:
    print "Ex5_GenList yields -- Equivalent to have a it.next() return"
    yield i
    i += 1
  print "Ex5_GenList exits, equivalent to : raise StopIteration"
    
class Ex5_GenListIteratorClass:
  
  def __init__(self):
    print "Ex5_GenListIteratorClass.__init__"
    self._i = 0
    
  def __iter__(self):
    print "Ex5_GenListIteratorClass.__iter__"
    return self

  def next(self):
    print "Ex5_GenListIteratorClass.next"
    nReturn = self._i
    if nReturn > 9:
      print "Ex5_GenListIteratorClass.next raises the StopIteration exception"
      raise StopIteration
    self._i += 1
    return nReturn
    
print
print sEx5Doc
print
print "The type of Ex5_GenList() is : %s" % type(Ex5_GenList())
print "Although it is really meant generator-iterator"
print "As you can see, calling Ex5_GenList() didn't enter the function since 
our"
print "trace \"Enters Ex5_GenList\" had not been printed."
print "It only produced the generator-iterator."
print
print "for n in Ex5_GenList(): gives:"
print
for n in Ex5_GenList():
  print n,
  sys.stdout.flush()

print
print "The type of Ex5_GenList is : %s" % type(Ex5_GenList)
print "Although it is really meant generator-function"
print
print "for n in Ex5_GenList: gives:"
print
try:
  for n in Ex5_GenList:
    print n,
    sys.stdout.flush()
except:
  traceback.print_exc()

print
print "The type of Ex5_GenList().__iter__() is : %s" % 
type(Ex5_GenList().__iter__())
print
print "for n in Ex5_GenList().__iter__(): gives:"
print
for n in Ex5_GenList().__iter__():
  print n,
  sys.stdout.flush()

print
print "it = Ex5_GenList()"
print "for n in it(): gives:"
print
it = Ex5_GenList()
try:
  for n in it():
    print n,
    sys.stdout.flush()
except:
  traceback.print_exc()
print "The iterator produced by Ex5_GenList() is obviously not itself 
callable"

print
print "for n in Ex5_GenListIteratorClass(): gives:"
print
for n in Ex5_GenListIteratorClass():
  print n,
  sys.stdout.flush()

sEx6Doc = \
"""
================================================================================
Example 6
================================================================================

After having distinguished iteratable from iterator and generator-function 
from
generator-iterator, and having seen how generator-iterator makes iterable and
iterator undistinguishables, we are now ready for some real work. There is now
a real vocabulary problem in the Python community but I think that it is now
clear enough.

Our first pseudo-real-world example is to produce the Fibonacci sequence using
a simple generator.

The Ex6_Fibonacci() is a bit special as it never stops. This is one way to
implement something similar to infinite lists in python.
"""
def Ex6_Fibonacci():
  a = 1
  b = 1
  yield a
  yield b
  while True:
    bb = a + b
    a = b
    b = bb
    yield bb

print
print sEx6Doc
print
it = Ex6_Fibonacci()
for i in xrange(0,10):
  print it.next()

sEx7Doc = \
"""
================================================================================
Example 7
================================================================================
A beautiful algorithm to produce the fibonacci sequence goes by noticing 
that :
  
fib(i) =          1, 1, 2, 3, 5,  8,  13, 21, 34, 55,  89,  144, 233, 377, ...
fib(i+1) =        1, 2, 3, 5, 8,  13, 21, 34, 55, 89,  144, 233, 377, ...
fib(i)+fib(i+1) = 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, ...

Generators now enable us to ""run after our tail"". We will exploit the 
Fibonacci sequence property shown above by defining a function that produces
and returns a "list" while the production algorithm supposes the list as 
having
been already produced by recursively calling itself. In order to be able to do
this, we need to initiate the processus by explicitly giving the first values.
We also need that the list must be produced "lazily" ; i.e. the values must be 
produced only as needed ... well needed to produce the next element in the 
""list"" in our case.

Here is a first try that will almost work.
"""

## Without the traces:
##
## def Ex7_Fibonacci():
##   yield 1
##   yield 1
##   itTail = Ex7_Fibonacci()
##   itTail.next() # Skip the first one
##   for n in (head + tail for (head, tail) in izip(Ex7_Fibonacci(), itTail)):
##     yield n

N_STACK_DEPTH = 0
def Ex7_Fibonacci():
  global N_STACK_DEPTH
  N_STACK_DEPTH += 1
  nStackDepth = N_STACK_DEPTH
  
  print "[%d] Entering Ex7_Fibonacci" % nStackDepth
  print "[%d] Yielding 1" % nStackDepth
  yield 1
  print "[%d] Yielding 1" % nStackDepth
  yield 1
  itTail = Ex7_Fibonacci()
  itTail.next() # Skip the first one
  for n in (head + tail for (head, tail) in izip(Ex7_Fibonacci(), itTail)):
    print "[%d] Yielding %d" % (nStackDepth, n)
    yield n

print
print sEx7Doc
print
print list(islice(Ex7_Fibonacci(), 5))


sEx8Doc = \
"""
================================================================================
Example 8
================================================================================
Running after your tail with itertools.tee

Example 7 shows that although the idea is beautiful, the implementation is
very inefficient. 

To work efficiently, the beginning of the list must not be recomputed over
and over again. This is ensured in most FP languages as a built-in feature.
In python, we have to explicitly maintain a list of already computed results
and abandon genuine recursivity.

The "tee" function does just what we want. It internally keeps a generated
result for as long as it has not been "consumed" from all of the duplicated
iterators, whereupon it is deleted. You can therefore print the Fibonacci
sequence during hours without increasing memory usage, or very little.

The beauty of it is that recursive running after their tail FP algorithms
are quite straightforwardly expressed with this Python idiom.
"""

def Ex8_Fibonacci():
  print "Entering Ex8_Fibonacci"
  def _Ex8_Fibonacci():
    print "Entering _Ex8_Fibonacci"
    yield 1
    yield 1
    fibTail.next() # Skip the first one
    for n in (head + tail for (head, tail) in izip(fibHead, fibTail)):
      yield n
  fibHead, fibTail, fibRes = tee(_Ex8_Fibonacci(), 3)
  return fibRes
  
print
print sEx8Doc
print
print list(islice(Ex8_Fibonacci(), 5))




More information about the Python-list mailing list