CORRECTION: Re: Iterating through a file significantly slower when file has big buffer

Mon Jan 26 18:48:01 EST 2009

Added the following lines missing from my original post:

strategy1 = timer( 'Default buffer' )
strategy1.start()

Code below is now complete.

Malcolm
SOURCE:
import time
# timer class
class timer( object ):
    def __init__( self, message='' ):
        self.message = message
    def start( self ):
        self.starttime = time.time()
        print 'Start:  %s' % ( self.message )

    def stop( self ):
        print 'Finish: %s %6.2f' % ( self.message, time.time() -
self.starttime )
# myFileName points to a 2G text file.
myFileName = r'C:\logs\jan2009.dat'
# default buffering
strategy1 = timer( 'Default buffer' )
strategy1.start()
myFile = open( myFileName )
for line in myFile:
    pass
myFile.close()
strategy1.stop()
# setting the buffer size to 16M
bufferSize = 2 ** 24
strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) )
strategy2.start()
myFile = open( myFileName, 'rt', bufferSize )
for line in myFile:
    pass
myFile.close()
strategy2.stop()
OUTPUT:
Start:  Default buffer
Finish: Default buffer  69.98
Start:  Large buffer (16384k)
Finish: Large buffer (16384k) 493.88  <--- 7x slower
Any comments regarding this massive slowdown?
Thanks,
Malcolm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090126/1aaea238/attachment-0001.html>