Lazy "for line in f" ?
Duncan Booth
duncan.booth at invalid.invalid
Mon Jul 23 04:33:36 EDT 2007
Alexandre Ferrieux <alexandre.ferrieux at gmail.com> wrote:
> On Jul 23, 9:36 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> Alexandre Ferrieux <alexandre.ferri... at gmail.com> writes:
>> > So I'll reiterate the question: *why* does the Python library add
that
>> > extra layer of (hard-headed) buffering on top of stdio's ?
>>
>> readline?
>
> I know readline() doesn't have this problem. I'm asking why the file
> iterator does.
>
Here's a program which can create a large file and either read it with
readline or iterate over the lines. Output from various runs should
answer your question.
The extra buffering means that iterating over a file is about 3 times
faster than repeatedly calling readline.
C:\Temp>test.py create 1000000
create file
Time taken=7.28 seconds
C:\Temp>test.py readline
readline
Time taken=1.03 seconds
C:\Temp>test.py iterate
iterate
Time taken=0.38 seconds
C:\Temp>test.py create 10000000
create file
Time taken=47.28 seconds
C:\Temp>test.py readline
readline
Time taken=10.39 seconds
C:\Temp>test.py iterate
iterate
Time taken=3.58 seconds
------- test.py ------------
import time, sys
NLINES = 10
def create():
print "create file"
f = open('testfile.txt', 'w')
for i in range(NLINES):
print >>f, "This is a test file with a lot of lines"
f.close()
def readline():
print "readline"
f = open('testfile.txt', 'r')
while 1:
line = f.readline()
if not line:
break
f.close()
def iterate():
print "iterate"
f = open('testfile.txt', 'r')
for line in f:
pass
f.close()
def doit(fn):
start = time.time()
fn()
end = time.time()
print "Time taken=%0.2f seconds" % (end-start)
if __name__=='__main__':
if len(sys.argv) >= 3:
NLINES = int(sys.argv[2])
if sys.argv[1]=='create':
doit(create)
elif sys.argv[1]=='readline':
doit(readline)
elif sys.argv[1]=='iterate':
doit(iterate)
----------------------------
More information about the Python-list
mailing list