Deleting from a list while iterating

Rhamphoryncus rhamph at gmail.com
Sun Dec 3 10:37:15 EST 2006


The problems of this are well known, and a suggestion for making this
easier was recently posted on python-dev.  However, I believe this can
be done just as well without a change to the language.  What's more,
most of the suggested methods (in my search results as well as the
suggestion itself) do not scale well, which my approach would solve.

My approach is to make a set of indexes to removed while iterating,
then use a list comprehension to filter them out after.  Timings of
this and two other common approaches follow:

setapproach = """\
def func(count):
    from random import random
    items = [random() for i in xrange(count)]
    remove = set()
    for index, x in enumerate(items):
        #...do something...
        if x < 0.5:
            remove.add(index)
    items = [x for index, x in enumerate(items) if index not in remove]
"""

copyapproach = """\
def func(count):
    from random import random
    items = [random() for i in xrange(count)]
    for x in items[:]:
        if x < 0.5:
            items.remove(x)
"""

reverseapproach = """\
def func(count):
    from random import random
    items = [random() for i in xrange(count)]
    for index in range(len(items) - 1, -1, -1):
        if items[index] < 0.5:
            del items[index]
"""

>>> import timeit
>>> timeit.Timer(stmt='func(1000)', setup=setapproach).timeit(1)
0.0016040802001953125
>>> timeit.Timer(stmt='func(1000)', setup=copyapproach).timeit(1)
0.0085191726684570312
>>> timeit.Timer(stmt='func(1000)', setup=reverseapproach).timeit(1)
0.0011308193206787109

>>> timeit.Timer(stmt='func(10000)', setup=setapproach).timeit(1)
0.021183013916015625
>>> timeit.Timer(stmt='func(10000)', setup=copyapproach).timeit(1)
1.0268981456756592
>>> timeit.Timer(stmt='func(10000)', setup=reverseapproach).timeit(1)
0.038264989852905273

>>> timeit.Timer(stmt='func(100000)', setup=setapproach).timeit(1)
0.23896384239196777
>>> timeit.Timer(stmt='func(100000)', setup=copyapproach).timeit(1)
274.57498288154602
>>> timeit.Timer(stmt='func(100000)', setup=reverseapproach).timeit(1)
2.2382969856262207

As you can see, although reverse iteration is somewhat faster at
smaller sizes, a set is substantially faster at larger sizes, and I
believe is more readable anyway.  Copying shouldn't even be considered
unless you know the size will always be trivial (< 1000).

I'm sure there's a few other approaches that would do even better under
certain conditions.  One is a generator, if your input and output
should both be iterators.  Another is using slicing to move contiguous
sections of retained items over the removed items.  I leave both of
these as an exercise for the reader.

-- 
Adam Olsen, aka Rhamphoryncus




More information about the Python-list mailing list