File handling summary (was newbie question...)...

Alexander Sendzimir sendzimir at earthlink.net
Fri Dec 31 14:27:24 EST 1999


This is a summary of information that I obtained from the
"newbie question..." thread started by myself on 1999.12.28
regarding Python idioms for reading text files.

Thanks to all those that responded and contributed valuable
information. Especially, Justin Sheehy, Eugene Goodrich,
Aahz Maruch, Skip Montanaro, Amit Patel, and Fredrik Lundh.

All measurements are over 100 runs. Each run timed using
GNU time. I won't make any claims to the accuracy of these
numbers. I'm exercising my new found Python muscles more
than anything. However, I find it interesting that providing
a hint to readlines() does seem to cause less time in the
kernel. Of course, looping through each line of an input
file a line-at-a-time is the slowest by far (duh). I'm
curious to know the mechanism by which readlines() and
readlines(sizehint) operate and how they differ. I'll sift
through Python's source sometime soon. But not now. I think
it would be interesting to try this for trully large text
files on the order of 80-100MB.

The size of the data file is 4,851,630 bytes and was created
with the command "su -c 'du -ah /.'".

The code I wrote to obtain these values may be ftp'ed from
ftp://www.battleface.com/pub/ as fileread.tgz.

  abs
--------------------------------------------------------
Micron Transport Trek2 : 266MHz Pentium anything-but-a-celeron laptop: no swapping

infile.readlines() method:
   Total time:               107.85
   Average kernel time/run:    0.13
   Average task time/run:      0.95
   Average time/run:           1.08

infile.readlines( 4096 << 4 ) method:
   Total time:                98.04
   Average kernel time/run:    0.06
   Average task time/run:      0.92
   Average time/run:           0.98

infile.readline() loop method:
   Total time:               212.59
   Average kernel time/run:    0.03
   Average task time/run:      2.10
   Average time/run:           2.13
------------------------------------------------------
VALinux dual pentium III (XEON) : 500MHz : no swapping

infile.readlines() method:
   Total time:                85.91
   Average kernel time/run:    0.14
   Average task time/run:      0.72
   Average time/run:           0.86

infile.readlines( 4096 << 4 ) method:
   Total time:                73.56
   Average kernel time/run:    0.05
   Average task time/run:      0.69
   Average time/run:           0.74

infile.readline() loop method:
   Total time:               162.96
   Average kernel time/run:    0.03
   Average task time/run:      1.60
   Average time/run:           1.63








More information about the Python-list mailing list