Tweaking flatten() (was: Re: itertools.flatten()? and copying generators/iterators.)

Francis Avila francisgavila at yahoo.com
Thu Oct 30 03:54:49 EST 2003


Here is my final attempt at flatten.  It's very powerful, but also a bit
more unwealdy...

It combines the sequence-altering ability of Peter's 'toiter' version, with
his class-iterability caching.  It can flatten arbitrary data types, and
manipulate them as it goes.  See the giant doc string for examples.

Idea for enhancement: instead of itercond being a single function, allow it
to be a sequence of functions, which act as filters, each called in turn.
(Exercise for the reader.)

(For a mint, allow the sequence of functions to be nested...)

Ok, that's enough. Here we go:

--- Code ---

# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila.
def flatten_itercond(iterable, do_descend=None,
                       itercond=lambda seq:(None, seq)):
    """Yield leaves of nested iterable.

    Caveat: assumes that all instances of a class are iterable or not.
    To get around this, use an itercond function.

    do_descend is a dictionary of class:bool pairs.  The class of
    iterable is looked up in do_descend.  If its value is true,
    flatten iterates the item, else not.  Use this dictionary to set a
    lower bound to iteration.  Default is {''.__class__:False}, which
    ensures that strings are not broken up into characters.
    do_descend is added to as flatten goes along, to bypass checks
    for reoccuring classes' iterability.

    itercond is a function called when a lookup in do_descend fails.
    It must accept iterable, and return (isiterable, seq), where
    isiterable is one of (True, False, None) and seq is the (possibly
    modified) sequence which replaces iterable.  "True" means, "seq is
    iterable."  "None" means "seq.__class__ has consistent
    iterability: let do_descend optimize it."  "True or False" means
    the opposite, and will bypass do_descend completely.  Default is
    (None, iterable), i.e., let do_descend handle everything.

    Be careful with using 'None': it'll gain you some speed, but
    sometimes you return a seq that *you* may only use as
    *initerable,* but which is *intrinsically* iterable.  'None' does
    not let you control what do_descend thinks about seq's
    iterability, and this may lead to not reaching leaves you want to
    reach or recursing infinitely (if there's a string-like object
    around).  For example, if you return a tuple pair on leaves, the
    sequence is atomic to *you,* but not to do_descend.  See the
    leafdata function at the end of this docstring.

    Examples:
    >>> flatten = flatten_itercond  #for doctest
    >>> tree = [1, 2, ['This', 4, 'is', ['a']], 7, 'funky tree']
    >>> list(flatten(tree))
    [1, 2, 'This', 4, 'is', 'a', 7, 'funky tree']
    >>> # Now lets break up strings into characters.
    >>> def tochars(s):
    ...     # This one works by modifying s.
    ...     if isinstance(s, str):
    ...         # if not a char, split it into chars.
    ...         if len(s) != 1:
    ...             return True, tuple(s)
    ...         else:
    ...             # Return False
    ...             # to prevent adding strings to do_descend.
    ...             return False, s
    ...     else:
    ...         # flatten will sort this one out itself.
    ...         return None, s
    ...
    >>> def stopchars(s):
    ...     #This one works by stopping descent when we reach a char.
    ...     if isinstance(s, str):
    ...         if len(s) == 1:
    ...             return False, s
    ...         else:
    ...             return True, s
    ...     else:
    ...         return None, s
    ...
    >>> # Don't forget to turn off the default do_descend!
    >>> mres = list(flatten(tree, do_descend={}, itercond=tochars))
    >>> sres = list(flatten(tree, {}, stopchars))
    >>> if mres == sres:
    ...     print 'True'
    ...
    True
    >>> mres
    [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e']
    >>> # You can filter out things if you're obsessed with speed,
    >>> # but better to do it *after* you get a flat list from
    >>> # flatten()...
    >>> def dropnum(s):
    ...     if isinstance(s, type(0)):
    ...         # Can't return nothing,
    ...         # so return something that *iterates* to nothing!
    ...         return True, ()
    ...     else:
    ...         return None, s
    ...
    >>> list(flatten(tree, itercond=dropnum))
    ['This', 'is', 'a', 'funky tree']


    Itercond's real power is with trees of arbitrary objects...

    >>> flatten=flatten_itercond #for doctest
    >>> class Node(object):
    ...     def __init__(self, label=None, data=None, children=()):
    ...         self.children = children
    ...         self.label = label
    ...         self.data = data
    ...     def __iter__(self):
    ...         return iter(self.children)
    ...
    >>> leaves = [Node(chr(i+65),i) for i in range(5)]
    >>> branches = [Node(chr(i+65), i, leaves) for i in range(5,10)]
    >>> root = [Node(chr(i+65), i, branches) for i in range(10,15)]
    >>> list(flatten(root))
    []
    >>> #We just listed the children of the leaves--of course it's empty!
    >>> def leafdata(n):
    ...     if isinstance(n, Node):
    ...         if not n.children:
    ...             return False, (n.label, n.data)
    ...         else:
    ...             #Strictly speaking, Node doesn't need an __iter__!
    ...             #We could return iter(n.children) here if we want.
    ...             return True, n
    ...     else:
    ...         return None, n # n is a list of Nodes.
    ...
    >>> list(flatten(root, itercond=leafdata))
    [('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]

    Use your imagination...

    """
    if do_descend is None:
        do_descend = {''.__class__:False, u''.__class__:False}
    try:
        descend = do_descend[iterable.__class__]
    except KeyError:

        # If itercond has something to say about iterable, use it and
        # don't add the lookup to do_descend.

        descend, iterable = itercond(iterable)
        if not descend is None:
            # If descend *is* None, class has consistent iterability.
            if descend:
                iterable = iter(iterable)
            # Else, descend is False.
        else:
            t = iterable.__class__
            try:
                iterable = iter(iterable)
            except TypeError:
                descend = do_descend[t] = False
            else:
                descend = do_descend[t] = True

    if not descend:
        yield iterable
    else:
        for elem in iterable:
            for subelem in flatten_itercond(elem, do_descend, itercond):
                yield subelem

--
Francis Avila





More information about the Python-list mailing list