altering an object as you iterate over it?

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri May 19 16:18:05 EDT 2006


"John Salerno" <johnjsal at NOSPAMgmail.com> wrote in message
news:D%obg.2140$No6.46498 at news.tufts.edu...
> John Salerno wrote:
> > What is the best way of altering something (in my case, a file) while
> > you are iterating over it? I've tried this before by accident and got an
> > error, naturally.
> >
> > I'm trying to read the lines of a file and remove all the blank ones.
> > One solution I tried is to open the file and use readlines(), then copy
> > that list into another variable, but this doesn't seem very efficient to
> > have two variables representing the file.
> >
> > Perhaps there's also some better to do it than this, including using
> > readlines(), but I'm most interested in just how you edit something as
> > you are iterating with it.
> >
> > Thanks.
>
> Slightly new question as well. here's my code:
>
> phonelist = open('file').readlines()
> new_phonelist = phonelist
>
> for line in phonelist:
>      if line == '\n':
>          new_phonelist.remove(line)
>
> import pprint
> pprint.pprint(new_phonelist)
>
> But I notice that there are still several lines that print out as '\n',
> so why doesn't it work for all lines?

Okay, so it looks like you are moving away from modifying a list while
iterating over it.  In general this is good practice, that is, it is good
practice to *not* modify a list while iterating over it (although if you
*must* do this, it is possible, just iterate from back-to-front instead of
front to back, so that deletions don't mess up your "next" pointer).

Your coding style is a little dated - are you using an old version of
Python?  This style is the old-fashioned way:

noblanklines = []
lines = open("filename.dat").readlines()
for line in lines:
    if line != '\n':
        noblanklines.append(lin)

1. open("xxx") still works - not sure if it's even deprecated or not - but
the new style is to use the file class
2. the file class is itself an iterator, so no need to invoke readlines
3. no need for such a simple for loop, a list comprehension will do the
trick - or even a generator expression passed to a list constructor.

So this construct collapses down to:

noblanklines = [ line for line in file("filename.dat") if line != '\n' ]


Now to your question about why '\n' lines persist into your new list.  The
answer is - you are STILL UPDATING THE LIST YOUR ARE ITERATING OVER!!!
Here's your code:

new_phonelist = phonelist

for line in phonelist:
     if line == '\n':
         new_phonelist.remove(line)

phonelist and new_phonelist are just two names bound to the same list!  If
you have two consecutive '\n's in the file (say lines 3 and 4), then
removing the first (line 3) shortens the list by one, so that line 4 becomes
the new line 3.  Then you advance to the next line, being line 4, and the
second '\n' has been skipped over.

Also, don't confuse remove with del.  new_phonelist.remove(line) does a
search of new_phonelist for the first matching entry of line.  We know line
= '\n' - all this is doing is scanning through new_phonelist and removing
the first occurrence of '\n'.  You'd do just as well with:

numEmptyLines = lines.count('\n')
for i in range( numEmptyLines ):
    lines.remove('\n')

Why didn't I just write this:

for i in range( lines.count('\n') ):
    lines.remove('\n')

Because lines.count('\n') would be evaluated every time in the loop,
reducing by one each time because of the line we'd removed.  Talk about
sucky performance!

You might also want to strip whitespace from your lines - I expect while you
are removing blank lines, a line composed of all spaces and/or tabs would be
equally removable. Try this:

lines = map(str.rstrip, file("XYZZY.DAT") )

-- Paul





More information about the Python-list mailing list