For-each behavior while modifying a collection

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Nov 28 22:22:45 EST 2013


On Thu, 28 Nov 2013 16:49:21 +0100, Valentin Zahnd wrote:

> It is clear why it behaves on that way. Every time one removes an
> element, the length of the colleciton decreases by one while the counter
> of the for each statement is not. The questions are:
> 1. Why does the interprete not uses a copy of the collection to iterate
> over it? Are there performance reasons? 

Of course. Taking a copy of the loop sequence takes time, possible a 
*lot* of time depending on the size of the list, and that is a total 
waste of both time and memory if you don't modify the loop sequence. And 
Python cannot determine whether or not you modify the sequence. Consider 
this:

data = some_list_of_something
for item in data:
    func(item)


Does func modify the global variable data? How can you tell? Without 
whole-of-program semantic analysis, you cannot tell whether data is 
modified or not. Consider this one:

def func(obj):
    stuff = globals()['DA'.lower() + 'ta']
    eval("stuff.remove(obj)")


Do you expect Python to analyse the code in sufficient detail to realise 
that in this case, it needs to make a copy of the loop sequence? I don't. 
It is much better to have the basic principle that Python will not make a 
copy of anything unless you ask it to. You, the programmer, are in the 
best position to realise whether you are modifying the loop sequence and 
can decide whether to make a shallow copy or a deep copy.

It is a basic principle in programming that you shouldn't modify objects 
that you are traversing over unless you are very, very careful. Given 
that, Python does the right thing here.


> 2. Why is the counter for the iteration not modified?

What counter? There is no counter. You are iterating over an iterator, 
not running a C or Pascal "for i := 1 to 20" style loop.

Even if there was a counter, how should it be modified? The code you show 
was this:

def keepByValue(self, key=None, value=[]):
    for row in self.flows:
        if not row[key] in value:
            self.flows.remove(row)


What exactly does the remove() method do? How do you know?

self.flows could be *any object at all*, it won't be known until run-
time. The remove method could do *anything*, that won't be known until 
runtime either. Just because you, the programmer, expects that self.flows 
will be a list, and that remove() will remove at most one item, doesn't 
mean that Python can possibly know that. Perhaps self.flows returns an 
subclass of list, and remove() will remove all of the matching items, not 
just one. Perhaps it is some other object, and rather than removing 
anything, in fact it actually inserts extra items in the middle of the 
sequence. (There is no law that says that methods must do what they say 
they do.)

You are expecting Python to know more about your program than you do. 
That is not the case.


-- 
Steven



More information about the Python-list mailing list