Basic file operation questions

Peter Otten __peter__ at web.de
Thu Feb 3 16:48:02 EST 2005


Caleb Hattingh wrote:

>> Yes, you can even write
>>
>> f = open("data.txt")
>> for line in f:
>>     # do stuff with line
>> f.close()
>>
>> This has the additional benefit of not slurping in the entire file at
>> once.
> 
> Is there disk access on every iteration?   I'm guessing yes?  It shouldn't
> be an issue in the vast majority of cases, but I'm naturally curious :)

Well, you will hardly find an OS that does no buffering of disk access --
but file.next() does some extra optimization as Steven already explained.
Here are some timings performed on the file that has the first-hand
information about Python's file buffering strategy :-)

$ python2.4 -m timeit 'for line in file("fileobject.c"): pass'
1000 loops, best of 3: 528 usec per loop
$ python2.4 -m timeit 'for line in file("fileobject.c").readlines(): pass'
1000 loops, best of 3: 635 usec per loop
$ python2.4 -m timeit 'for line in iter(file("fileobject.c").readline, ""):
pass'
1000 loops, best of 3: 1.59 msec per loop
$ python2.4 -m timeit 'f = file("fileobject.c")' 'while 1:' '  if not
f.readline(): break'
100 loops, best of 3: 2.08 msec per loop

So not only is

for line in file(...):
   # do stuff

the most elegant, it is also the fastest. file.readlines() comes close, but
is only viable for "small" files.

Peter




More information about the Python-list mailing list