.readline() - VERY SLOW compared to PERL

Tue Nov 21 02:52:26 EST 2000

Thanks for your reply.

The alternate methods posted here (readlines with chunks)  cut down the weak
perfomance to nearly the
results, you posted.

For everyone interested, here are the scripts used for testing:

---------------------------------------
#!/usr/bin/perl

use Time::HiRes;

$_db = 'test.dat';
print "Running...\n";

$secStart = Time::HiRes::time;

open(DB, "<$_db") or die "Can't open $_db";
while($dbline = <DB>) {
 @rs = split(/\;/, $dbline);
 if($rs[0] eq 'TEST') {
  print("FOUND: $dbline\n");
  last;
 }
}
print("DONE!\n");
close(DB);
printf("Elapsed time: %f sec.\n", (Time::HiRes::time - $secStart));

exit(0);
---------------------------------------
Result of upper Script: 3.705 sec
---------------------------------------

import sys, string, time

print "Running..."
dbname = 'test.dat'
secStart = time.time()

db = open(dbname, 'r')

while 1:
 dbline = db.readline()
 if not dbline:
  break

 rs = string.split(dbline, ';')

 if rs[0] == 'TEST':
  print dbline + "\n"
  break

print "DONE!\n"
db.close
print "Elapsed time: %f sec." % (time.time() - secStart)
---------------------------------------
Result of upper Script: 12.097 sec.
---------------------------------------

import sys, string, time

print "Running..."
dbname = 'test.dat'
secStart = time.time()

db = open(dbname, 'r')

while 1:
    lines = db.readlines(250000)
    if not lines:
        break
    for dbline in lines:
   rs = string.split(dbline, ';')

   if rs[0] == 'TEST':
    print dbline + "\n"
    break

print "DONE!\n"
db.close
print "Elapsed time: %f sec." % (time.time() - secStart)
---------------------------------------
Result of upper Script: 5.629 sec.
---------------------------------------
Environment: 100.000 line ascii file of approx. 24 MB, Pentium III 550
MHZ/Win2K/256MB RAM/NTFS

Al the best,
Harald

"David Bolen" <db3l at fitlinxx.com> schrieb im Newsbeitrag
news:ur94ctqkw.fsf at ctwd0143.fitlinxx.com...
> "Harald Schneider" <h_schneider at marketmix.com> writes:
>
> > Thanks or your reply. .readlines() won't fit, since the data is VERY
huge.
> > So .readline() is a must.
>
> Well, you can still use readlines() but just supply a maximum buffer
> size so that it doesn't snarf too much of the file into memory at
> once.  Python will avoid reading past that number (subject to a small
> minimum like a few K I think), and then you can keep calling
> readlines() to continue processing the file in chunks.  This can make
> the I/O more efficient as well as the internal processing Python does
> for each line.
>
> Whether or not it works well for the other processing you need to do I
> can't say, but as a point of comparison, the following two scripts:
>
> Perl:
>
>     open(INPUT,'file.input') or die "Failure opening";
>
>     $count = 0;
>     while (<INPUT>) {
>       $count++;
>     }
>
>     print "$count\n";
>
>
> Python:
>
>     file = open('file.input')
>
>     count = 0
>     while 1:
> lines = file.readlines(8192)
> if not lines: break
> count = count + len(lines)
>
>     print count
>
>
> Run on my machine (WinNT 4.0 SP4) on a text file of 100,000 lines of
> 78 characters (8000000 bytes including line endings) in .951s for the
> Python script and .651s for the Perl script.  Bumping the buffer size
> up to 64K in the Python script drops it to .751s.
>
> So you can get it to within about 15% of the Perl runtime, but of
> course that mileage may vary once you do other processing within the
> loop.
>
> In general, this sort of raw text processing is just one of those
> cases where Perl is going to be more efficient than Python in general.
> Such processing is something that plays into Perl's strengths and
> something Perl was really designed to handle.  With that said, you can
> generally get Python to be at least competitive (which I'd agree your
> original 3x factor wasn't), and then it becomes a question of issues
> such as maintainability and/or purpose of the script as to which
> language might be more appropriate.
>
> You also mentioned search for a key in your first post, so I might
> also mention that while you would probably use a regex pattern match
> in Perl to locate lines with that key, depending on the effort to
> isolate the key from within each line, you may find better performance
> with Python by using functions from the string module as opposed to
> regex's.
>
> --
> -- David
> --
> /-----------------------------------------------------------------------\
>  \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
>   |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
>  /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
> \-----------------------------------------------------------------------/