[Tutor] Query: lists

Cameron Simpson cs at cskk.id.au
Tue Aug 14 17:38:01 EDT 2018


On 14Aug2018 18:11, Deepti K <kdeepti2013 at gmail.com> wrote:
> when I pass ['bbb', 'ccc', 'axx', 'xzz', 'xaa'] as words to the below
>function, it picks up only 'xzz' and not 'xaa'
>
>def front_x(words):
>  # +++your code here+++
>  a = []
>  b = []
>  for z in words:
>    if z.startswith('x'):
>      words.remove(z)
>      b.append(z)
>      print 'z is', z
>  print 'original', sorted(words)
>  print 'new', sorted(b)
>  print sorted(b) + sorted(words)

That is because you are making a common mistake which applies to almost any 
data structure, but is particularly easy with lists and loops: you are 
modifying the list _while_ iterating over it.

After you go:

  words.remove(z)

all the elements _after_ z (i.e. those after 'xzz' i.e. ['xaa']) are moved down 
the list.

In your particular case, that means that 'xaa' is now at index 3, and the next 
iteration of the loop would have picked up position 4. Therefore the loop 
doesn't get to see the value 'xaa'.

A "for" loop and almost anything that "iterates" over a data structure does not 
work by taking a copy of that structure ahead of time, and looping over the 
values. This is normal, because a data structure may be of any size - you do 
not want to "make a copy of all the values" by default - that can be 
arbitrarily expensive.

Instead, a for loop obtains an "iterator" of what you ask it to loop over. The 
iterator for a list effectively has a reference to the list (in order to obtain 
the values) and a notion of where in the list it is up to (i.e. a list index, a 
counter starting at 0 for the first element and incrementing until it exceeds 
the length of the list).

So when you run "for z in words", the iterator is up to index 3 when you reach 
"xzz". So z[3] == "xzz". After you remove "xzz", z[3] == "xaa" and in this case 
there is no longer a z[4] at all because the list is shortened. So the next 
loop iteration never inspects that value. Even if the list had more value, the 
loop would still skip the "xaa" value.

You should perhaps ask yourself: why am I removing values from "words"?

If you're just trying to obtain the values starting with "x" you do not need to 
modify words because you're already collecting the values you want in "b".

If you're trying to partition words into values starting with "x" and values 
not starting with "x", you're better off making a separate collection for the 
"not starting with x" values. And that has me wondering what the list "b" in 
your code was for originally.

As a matter of principle, functions that "compute a value" (in your case, a 
list of the values starting with "x") should try not to modify what they are 
given as parameters. When you pass values to Python functions, you are passing 
a reference, not a new copy. If a function modifies that reference's _content_, 
as you do when you go "words.move(z)", you're modifying the original.

Try running this code:

  my_words = ['bbb', 'ccc', 'axx', 'xzz', 'xaa']
  print 'words before =", my_words
  front_x(my_words)
  print 'words after =", my_words

You will find that "my_words" has been modified. This is called a "side 
effect", where calling a function affects something outside it. It is usually 
undesirable.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list