itertools.flatten()? and copying generators/iterators.

Raymond Hettinger vze4rx4y at verizon.net
Tue Oct 28 04:04:19 EST 2003


[Francis Avila]
> Below is an implementation a 'flattening' recursive generator (take a nested
> iterator and remove all its nesting). Is this possibly general and useful
> enough to be included in itertools? (I know *I* wanted something like it...)

Interesting post!

I'll add your idea to the list of itertool suggestions received so far.  My
first take
is that it more likely to be added to the "recipes" in the examples section.

Core itertools should be primitive building blocks that combine with one
another to make other tools.  Also, I'm trying to keep the toolset as small
as possible by excluding new tools that can be easily and efficiently coded
in pure python.

I would code it like this:

def flatten(s):
    try:
        iter(s)
    except TypeError:
        yield s
    else:
        for elem in s:
            for subelem in flatten(elem):
                yield subelem

As your examples show, it does have some suprising behavior in that
strings get split apart instead of staying intact.  That is easily taken care of
by an AtomicString subclass:

class AtomicString(str):
    def __iter__(self):
        raise TypeError

a = [1, 2, AtomicString('abc'), 4]



> >>> # The following I'm not sure what to do about...
> >>> empty = [1, [], 3]
> >>> emptyiter = [1, iter([]), 3]

The above code definition handles this without a problem.



> I'm having some problems determining whether a
> given object will iterate infinitely, if that object is already a
> generator/iterator.

In general, that is not knowable.



>  Basically, I'm unable to copy an iterator (why?).

Alex Martelli just wrote a PEP about making many iterators copyable.
See www.python.org/peps/pep-0323.html

Until that is adopted, your best bet is to use tee():

def tee(iterable):
    "Return two independent iterators from a single iterable"
    def gen(next, data={}, cnt=[0]):
        dpop = data.pop
        for i in count():
            if i == cnt[0]:
                item = data[i] = next()
                cnt[0] += 1
            else:
                item = dpop(i)
            yield item
    next = iter(iterable).next
    return (gen(next), gen(next))


> Also, why is the iterator type not included in the types module or described
> in the language reference (Python 2.2)?

There is no iterator type.  Iterators can be any object that supports the
iterator
protocol: __iter__() and next().  Generators, container iterators, and each of
the itertools define their own type.


This was a long but highly instructive pair of posts.  I hope it becomes
widely read.  Your work ventured   into previously uncharted
territory and touched on a number of frontier issues like copyability,
determining whether something is iterable, the various iterator types,
and the deep mathematical subject of determining whether a given
process is finite.


Raymond Hettinger






More information about the Python-list mailing list