Warning about "for line in file:"

Aldo Cortesi aldo at nullcube.com
Sat Feb 16 08:44:44 EST 2002


Thus spake Brian Kelley (bkelley at wi.mit.edu):

> Neil Schemenauer wrote:
> 
> >Russell E. Owen wrote:
> >
> >>The readline or xreadline file methods work fine, of course.
> >>
> >
> >Why "of course"?  iter(file) does the same thing as file.xreadlines().
> >Have you tested xreadlines?
> >
> >  Neil
> >
> >
> 
> I had the same problem with xreadlines but "for line in
> file" is MUCH less explicit and leads to erros like this.
> 
> file = open(...)
> 
> count = 0
> for line in file:
>     if count > 10: break
>     print line
>     count = count + 1
> 
> for line in file:
>     print line
> 
> Doesn't work like I would expect.  This is essentially
> doing the following:
> 
> file = open(...)
> 
> count = 0
> for line in file.xreadlines():
>     if count > 10: break
>     print line
>     count = count + 1
> 
> for line in file.xreadlines():
>     print line
> 
> So what is REALLY happening is that you are creating two
> seperate iterators in the above examples.  Writing "for
> line in file" instead of "for line in file.xreadlines()"
> simply hides and confuses this.
> 
> The problem with spawning multiple iterators is that their
> is a read cache going on behind the scenes and
> file.xreadlines() doesn't rewind the file to the starting
> point.


Actually this has nothing to do with iterators, or a "read
cache". iter(file) creates a line iterator that does the
same thing as file.readline() every time .next() is called,
until it reaches the end of the file. But file.readline(),
just like any other file read, starts reading at the
_current seek position_ of the file. 

For instance, say we have a file with one digit per line,
like this:

1
2
3
4
5

On your machine it may differ, but on my machine this file
is exactly 10 characters long - each digit is followed by a
line feed. If we now do:

file = open("file")

print file.tell()
for i in file:
    pass
print file.tell()

We see that the file position started at character 0, and
ended at character 10. Another attempt to read from the file
will produce nothing. However, if we now do:

file.seek(6)

then...

for i in file:
    print i,

... we get:

4
5

In a nutshell, a file read will start at the file offset,
which can be found by going file.tell(), and set by going
file.seek(). This is the case wether you use iterators,
xreadlines(), readlines(), or just plain read()...



Cheers,



Aldo





-- 
Aldo Cortesi
aldo at nullcube.com
www.nullcube.com




More information about the Python-list mailing list