2.3 list reverse() bug?

Sat Dec 27 18:00:49 EST 2003

Bjorn Pettersen wrote:
   ...
>> Here's my candidate for "most frequent scenario where copies are needed":
>> 
>> for item in somelist:
>>     if froobable(item):
>>         somelist.append(frooble(item))
>>     elif uncadable(item):
>>         somelist.remove(item)
>> 
>> i.e.: you want to modify the very list you're looping on.  This is one
>> of Python's few but important "traps and pitfalls".
> 
> That is an interesting version in that it iterates while either (1)
> appending or (2) removing. I.e. froobable(frooble(item)) must always be
> false to prevent infinite recursion.

It doesn't have to _always_ be false -- some frooble(item)'s might be
froobable, as long as each such chain always ends in a non-froobable.

> I don't think I've seen one of these
> and it made me curious, do you have a concrete example?

Roughly, yes, though not quite as direct and obvious as this one.  E.g.:

for listener_method in this_object.registered_listeners:
    listener_method(this_object, change_hint)

where some of the registered listener methods were trying to deregister
themselves and/or register some other listener(s).  This hidden case of
"mutating the list being iterated on" was a rather serious bug, but it
only led to infinite recursion (and hence to discovery) when a subtle
combination of circumstances led to addition of unlimited number of new
listeners.  In a simpler variation, twisted's woven component had a
list of _weak_ references to listeners giving potential for similar
misbehavior (now fixed by the usual, simple approach of looping over a
copy of the list).

Sometimes I wonder if it might not be worth giving such "fragile" 
iterables as lists and dicts the ability to emit warnings when mutated,
while they have iterators outstanding, in the ways that are likely to
misbehave (adding or removing items to a list, or keys to a dict --
some container mutations, namely change of what value corresponds to
a fixed key in a dict, are obviously benign in this sense).

> My first instinct would be a "functional" approach, creating a new correct
> list and abandoning the old incorrect version:
> 
>   # O(n)
>   tmp = []
>   for item in somelist:
>       if shouldModify(item):
>            tmp.append(mutate(item))
>       elif shouldRemove(item):
>            pass
>       else:
>            tmp.append(item)
>   somelist = tmp

Sure, this is often a feasible approach (assuming mutate(item) returns
the value one wants in lieu of item, of course).  It's often particularly
attractive to express it as a list comprehension:

def possibly_mutated(item):
    if shouldModify(item):
        mutate(item)         # no assumption it returns the value
    return item

somelist[:] = [ possibly_mutated(item) for item in somelist
                if not shouldRemove(item) ]

(I'm changing somelist in-place, rather than just rebinding the
name, but of course depending on one's needs name rebinding can
sometimes work just as well).

>   # O(n**2)
>   i = 0
>   while i < len(somelist):
>       if shouldModify(item):

this would raise a NameError about item, so I don't think this is
the solution you wanted to post -- perhaps missing an
        item = somelist[i]
?

>           somelist[i] = mutate(item)
>           i += 1
>       elif shouldRemove(item):
>           del somelist[i]
>       else:
>           i += 1

Yeah, the (possibly repeated) 'del' in the middle of the list has
bad performance (even worse, of course, is the somelist.remove I
had in the original example).  Might as well keep two indices (and
enumerate lets you update manually just one of them...):

   itarget = 0
   for isrc, item in enumerate(somelist):
       if shouldModify(item):
           somelist[itarget] = mutate(item)
           itarget += 1
       elif shouldRemove(item):
           pass
       else:
           somelist[itarget] = item
           itarget += 1
   del somelist[itarget:]

possibly more elegantly expressed, if the tests can be applied
in different order without affecting their outcome, as:

   itarget = 0
   for isrc, item in enumerate(somelist):
       if shouldRemove(item):
           continue
       if shouldModify(item):
           item = mutate(item)     # or just mutate(item) ...
       somelist[itarget] = item
       itarget += 1
   del somelist[itarget:]

> I thought I found a solution using enumerate, but it was a mirage ;-) So

Maybe one of the above ones...?  They're both O(N), btw.

> Returning to the example in the beginning, the functional version would be
> 
>   tmp = []; tail = []
>   for item in somelist:
>       if froobable(item):
>           tail.append(frooble(item))
>       elif uncadable(item):
>           pass
>       else:
>           tmp.append(item)
>   somelist = tmp + tail
> 
> provided froobable(frooble(item)) is always false. If that is not the

You also need to assume uncadable(frooble(item)) is also always false,
otherwise the original version would remove the frooble(item) (after first
adding it) and this one wouldn't.

> case, it would make a functional solution messier. (but I think that would

Not all that much, I think -- you just need to recurse on tail (the end
case for the recursion is when tail is empty), i.e. something like:

def froobanduncad(somelist):
    tmp = []; tail = []
    for item in somelist:
        if froobable(item):
            tail.append(frooble(item))
        elif uncadable(item):
            pass
        else:
            tmp.append(item)
    if tail:
        froobanduncad(tail)
    somelist[:] = tmp + tail

of course, you DO need to change-in-place (or return the result) -- just
rebinding the argument name would be little use;-).

> be a rather perverse problem space ;-)

As long as the recursion eventually terminates, not particularly, IMHO.

> The case I've seen come up repeatedly is a little simple, only involving
> deletions of the list:
> 
>   x = range(30)
>   for i in x:
>       if 5 <= i <= 10: # try to remove 5,6,7,8,9,10
>           x.remove(i)

You're luckier than I am, I think -- in the buggy cases I see, there
is often some 'else:' branch to the if guarding the removal, as well
as some extra action around the removal itself -- and it's not 
necessarily the case that the loop can be split into a list comprehension
first (to weed out the items that need to be removed) and a loop later (for
the actions on surviving items) because the order in which things are
performed can be significant (sometimes it's quite a bother to check
whether it is, or isn't, significant...).  You can handle it with appending
nonremoved items to an empty list (unless order of execution including
possible __del__ special methods bites you...) -- but simply looping on
a copy is the simplest, least-invasive modification you can perform on
the subtly buggy code to make it non-buggy.

> the above removes 6,8,10 instead. The solution doesn't require a copy to
> iterate over though, a list comprehension will do (as will calling filter
> if you want to go against the times)::
> 
>   x = [n for n in x if not 5 <= n <= 10]

If removal is ALL you need, yes -- and in this case a LC or the like is
also far more efficient (the .remove method is a dog...;-).

> # I realize there are quite a number of Python programmers that aren't
> # CS majors. Big-O notation [ O(n) and O(n**2) above ] is simply a

I'm not a CS major, for example, but that doesn't imply I'm not pretty
good at O()-notation anyway...;-).  (Some of us engineers aren't all
that bad, you know...).

> # I'll make another absolute statement <wink> and say that so few
> # people will work with lists big enough to ever care that it's
> # irrellevant largely  :-)

It doesn't take much for O(N*N) to bite, in cases, such as this one,
where the multiplicative constants aren't all that different from
the O(N) possibilities.  I.e., lists of a few hundreds items are very
common in many programs.  Removing half the items from a list with
.remove takes about 670 usec [[on my machine, according to timeit.py]],
while a LC takes 115 usec, for a list of length 200; when the list's
length is 400, the time is 220 for the LC, 2100 for the .remove's.
These levels of performance hits can often be quite relevant, IMHO.

Alex