Conventions for iterators

Tim Peters tim.one at home.com
Fri Feb 1 04:17:37 EST 2002


[jlbec at evilplan.org]
> 	I was wondering what (if any) were the conventions used when
> generating iterators.  I know the next()/StopIteration protocol.  I'm
> speaking more of consistency conventions.
> 	Say I have a list-like object containing the objects (obj1,
> obj2, obj3, obj4, obj5).  Each successive iteration returns the next
> object.  Pretty simple.  However, what if I call
> "listthing.delete(cur_obj)", where cur_obj is the current object
> returned by the iteration?  There are many things that *could* happen
> (let's say that cur_obj is obj3).
>
> 1. Stop iterating, because the change renders the iterator invalid.
> 2. The next element (obj4) is correctly iterated.
> 3. The internal structure isn't exactly a list, so the next element
>    (when queried) now may be obj2 again, or might skip obj4 to obj 5)

People who mutate a container while iterating over it are living
dangerously, and Python doesn't try to stop them from blowing off their
feet, except to guarantee that this doesn't happen:

> 4. The list-like object is broken due to a state inconsistency created
>    here.

Short of that, anything goes:  the behavior is undefined.  It may be any of
your 1-3, or even a mix.  2.2 introduced dict iterators (dict.iteritems()
and its friends), and they currently make a good-faith attempt at your #1,
but only so far as they can while remaining cheap.  IOW, you *may* get your
#2 and #3 behaviors at times when mutating a dict while iterating over it,
but will usually get #1.  Here's a contrived example; nothing about its
behavior is guaranteed across releases:

>>> d = {1:1, 2:2}
>>> for k in d:
...     print k
...     del d[k]
...     d[10*k] = 10*k
...
1
2
20
10
>>> d
{200: 200, 100: 100}
>>>

If you can't live with that, good!  Do the sensible thing and iterate over a
*copy* (for k in d.keys(); for k, v in d.copy(); etc).

Mutating sequences while iterating over them is well-defined, but not via
any of your 1-3.  You can read "the rules" in section 7.3 ("The 'for'
statement") of the Language Reference manual.  Relying on those rules is
generally stupid, but I confess I've often used

all = [root]
d = {root: 1}
for node in all:
    for child in node.children():
        if child not in d:
            d[child] = 1
            all.append(child)

to get a breadth-first list of the nodes in a rooted, directed graph.  This
relies on that growing a list while iterating over it works exactly the way
it's defined to work.

Most often, though, it's again best to iterate over list[:] (i.e., a copy),
if you do size-changing mutations during iteration.

> 	Now, obviously, 3 and 4 are bad.

#4 is right out:  no way.  #3 doesn't offend Pythonic sensibilities, though.

> ...
> With "one modification invalidates the iterator," you cannot have
> code like:
>
> for item in list:
>     if matches(item):
>         list.remove(item)

Code like that won't complain today, but won't do what you probably hope for
either.  Change it to "for item in list[:]" and there's not even a
possibility for confusion.

> So, obviously, behavior 2 is "best."

It would be in Scheme, but not in Python.  Python enjoys making tradeoffs
that drive *someone* crazy <wink>.  Since the semantics of mutating while
iterating are subtle even in an "anally correct" implementation, Python
punts and gives you a different choice:  accept whatever you get, or make
the issues utterly clear by iterating over a copy.  In practice I've never
found this to be a real limitation (and have found many clever ways to avoid
mutating while iterating <wink>), but it is true that *everyone* gets burned
by your list.remove() example once.

> But I was wondering what the python community expects/requires.

Most of all, they want clarity.  If you can't easily explain what mutating
during iteration will do, exactly, leave it undefined beyond guaranteeing
the object remains sane "no matter what".  Then if you can gripe if they try
to do something undefined, so much the better, but that's not worth gross
complications or slowdowns in your implementation.  Python doesn't stick its
head up its ass here trying to swallow mutatarrhea, so why should you
<wink>.

caveat-mutator-ly y'rs  - tim





More information about the Python-list mailing list