Iterating through a file significantly slower when file has big buffer

Mon Jan 26 18:42:29 EST 2009

I'm working with very large text files and am always looking for
ways to optimize the performance of our scripts.
While reviewing our code, I wondered if changing the size of our
file buffers to a very large buffer size might speed up our file
I/O. Intuitively, I thought that bigger buffers might improve
performance by reducing the number of reads. Instead I observed
just the opposite - performance was 7x slower! (~500 sec vs. 70
sec) and used 3x the memory (24M vs. 8M) due to the larger
buffer.
The following tests were run on a Windows XP system using Python
2.6.1
SOURCE:
import time
# timer class
class timer( object ):
    def __init__( self, message='' ):
        self.message = message
    def start( self ):
        self.starttime = time.time()
        print 'Start:  %s' % ( self.message )

    def stop( self ):
        print 'Finish: %s %6.2f' % ( self.message, time.time() -
self.starttime )
# myFileName points to a 2G text file.
myFileName = r'C:\logs\jan2009.dat'
# default buffering
myFile = open( myFileName )
for line in myFile:
    pass
myFile.close()
strategy1.stop()
# setting the buffer size to 16M
bufferSize = 2 ** 24
strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) )
strategy2.start()
myFile = open( myFileName, 'rt', bufferSize )
for line in myFile:
    pass
myFile.close()
strategy2.stop()
OUTPUT:
Start:  Default buffer
Finish: Default buffer  69.98
Start:  Large buffer (16384k)
Finish: Large buffer (16384k) 493.88  <--- 7x slower
Any comments regarding this massive slowdown?
Thanks,
Malcolm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090126/ce113771/attachment.html>