.readline() - VERY SLOW compared to PERL

David Bolen db3l at fitlinxx.com
Thu Nov 16 00:48:31 EST 2000


"Harald Schneider" <h_schneider at marketmix.com> writes:

> Thanks or your reply. .readlines() won't fit, since the data is VERY huge.
> So .readline() is a must.

Well, you can still use readlines() but just supply a maximum buffer
size so that it doesn't snarf too much of the file into memory at
once.  Python will avoid reading past that number (subject to a small
minimum like a few K I think), and then you can keep calling
readlines() to continue processing the file in chunks.  This can make
the I/O more efficient as well as the internal processing Python does
for each line.

Whether or not it works well for the other processing you need to do I
can't say, but as a point of comparison, the following two scripts:

Perl:

    open(INPUT,'file.input') or die "Failure opening";

    $count = 0;
    while (<INPUT>) {
      $count++;
    }

    print "$count\n";


Python:

    file = open('file.input')

    count = 0
    while 1:
	lines = file.readlines(8192)
	if not lines: break
	count = count + len(lines)

    print count


Run on my machine (WinNT 4.0 SP4) on a text file of 100,000 lines of
78 characters (8000000 bytes including line endings) in .951s for the
Python script and .651s for the Perl script.  Bumping the buffer size
up to 64K in the Python script drops it to .751s.

So you can get it to within about 15% of the Perl runtime, but of
course that mileage may vary once you do other processing within the
loop.

In general, this sort of raw text processing is just one of those
cases where Perl is going to be more efficient than Python in general.
Such processing is something that plays into Perl's strengths and
something Perl was really designed to handle.  With that said, you can
generally get Python to be at least competitive (which I'd agree your
original 3x factor wasn't), and then it becomes a question of issues
such as maintainability and/or purpose of the script as to which
language might be more appropriate.

You also mentioned search for a key in your first post, so I might
also mention that while you would probably use a regex pattern match
in Perl to locate lines with that key, depending on the effort to
isolate the key from within each line, you may find better performance
with Python by using functions from the string module as opposed to
regex's.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list