when is the filter test applied?

ROGER GRAYDON CHRISTMAN dvl at psu.edu
Tue Oct 3 16:31:15 EDT 2017


On Tue, Oct  3, 2017 15:38:42, Neal Becker <ndbecker2 at gmail.com> wrote:
>
I'm not certain that it isn't behaving as expected - my code is quite
>complicated.
>
>On Tue, Oct 3, 2017 at 11:35 AM Paul Moore <p.f.moore at gmail.com> wrote:
>
>> My intuition is that the lambda creates a closure that captures the
>> value of some_seq. If that value is mutable, and "modify some_seq"
>> means "mutate the value", then I'd expect each element of seq to be
>> tested against the value of some_seq that is current at the time the
>> test occurs, i.e. when the entry is generated from the filter.
>>
>> You say that doesn't happen, so my intuition (and yours) seems to be
>> wrong. Can you provide a reproducible test case? I'd be inclined to
>> run that through dis.dis to see what bytecode was produced.
>>
>> Paul
>>
>> On 3 October 2017 at 16:08, Neal Becker <ndbecker2 at gmail.com> wrote:
>> > In the following code (python3):
>> >
>> > for rb in filter (lambda b : b in some_seq, seq):
>> >   ... some code that might modify some_seq
>> >
>> > I'm assuming that the test 'b in some_seq' is applied late, at the start of
>> > each iteration (but it doesn't seem to be working that way in my real
code),
>> > so that if 'some_seq' is modified during a previous iteration the test is
>> > correctly performed on the latest version of 'some_seq' at the start of
each
>> > iteration.  Is this correct, and is this guaranteed?
>
>

I think one must always be very careful about modifying anything
that you are iterating across -- if not avoiding that sort of thing entirely.
In particular, avoid anything that might add or remove elements
in the middle of the sequence.

YMMV, but you can imagine the iterator behaving like some of these
(all naturally implementation-dependent)

Traversing an indexable sequence (such as a list or string)
might maintain an internal counter that increments for each iteration.
Traversing a linked sequence (such as a linked list)
might maintain an internal reference that advances down the chain.
If you were to remove the element that you were currently visiting
(such was done in   "for c in x:   x.remove(c)")) you may end up
missing items.   For example, if you remove element 10 from a 
list or string, element 11 moves forward to position 10, but the
internal counter mentioned above would still advance to 11 --
skipping over the item that followed what you removed.
Similarly, if you to remove the linked list node that contained
the data you tried to remove, you might have an unspecified
behavior of where the linked list advancement would take you.

It would be safer to make a deep copy of the data iterable data
and iterate down the copy, so you can be sure that all your 
intended changes to the original data do not interfere with 
your iteration.

Here is an illustration of something going awry:
('.'s added in case my mailer destroys indents)
x = list('aaabaaaacaaaaadaaaaaae')

for c in x:

...  if c == 'a':

.......   x.remove('a')
''.join(x)
'bcaadaaaaaae'
Of the original 18 a's, 10 were removed.
The test for (c == 'a') only applied to every other a,
because the deletions removed some a's forward,
while the iterator kept advancing.
(And of course, it always deleted the first a in the list,
even if that is not where my iterator was looking)


And yes, I would expect the filter to be a lazy application
to the iterable you are filtering, so anything that removes
from your iterable is going to cause you filter so skip something.
That can also be fixed by copying the iterable first.

Or alternatively, you can go more functional and instead
create a new iterable set of data from the old, instead of
mutating what you have.

Roger Christman
Pennsylvania State University





More information about the Python-list mailing list