Python IO performance?

Chad Netzer cnetzer at mail.arc.nasa.gov
Sat May 31 04:37:33 EDT 2003


On Sat, 2003-05-31 at 00:26, Ganesan R wrote:

> Python is over 8 times slower! Is the problem with the fileinput
> module or is I/O just slower with python?

Probably a few things.  One is that this case really favors Perl because
it has operators for doing these things, and is heavily optimized for
such cases.

Secondly, you are timing the program startup time plus the loop.  Python
has to compile the program before executing it (don't know how Perl does
this, probably the same), then import a module before it does the loop. 
This adds a fixed overhead (for small input files, the startup time
could dominate).  Note that python doesn't create a pre-compiled
mycat.pyc file when you run a script directly on the command line like
this (it only does it when importing a module, or when explictly told).

Thirdly, the fileinput module itself is not the fastest method.  Here is
my quick hack version, that goes quite a bit faster, and uses file
iteration directly:

==== - mycat2.py
import sys

if len( sys.argv ) < 2:
    sys.exit()

f = file( sys.argv[1], "r" )
for line in f:
    print line,
f.close()
====

And here are some timings using perl 5.8.0 and python 2.2.3, and a
fairly large input file (6 megs):

$ time perl  mycat.pl ~/foo.txt >/dev/null
real    0m1.233s
user    0m1.090s
sys     0m0.020s

$ time python  mycat.py ~/foo.txt >/dev/null
real    0m11.327s
user    0m11.160s
sys     0m0.050s

$ time python  mycat2.py ~/foo.txt >/dev/null
real    0m4.014s
user    0m3.860s
sys     0m0.030s

So, in this case, the original python program is about 10 times slower
than Perl, and mine is about 3.5 to 4 times slower.

Python 2.3 beta1 has improved the file iteration even more.  Here are
those timings:

$ time python2.3  mycat2.py ~/foo.txt >/dev/null
real    0m2.757s
user    0m2.590s
sys     0m0.040s

$ time python2.3  mycat.py ~/foo.txt >/dev/null
real    0m8.420s
user    0m8.250s
sys     0m0.050s


Now my version is about 2.5 times slower than perl.  It is probably not
the case that Python will ever catch up to Perl completely for this
benchmark (again, this benchmark happens to play to Perl's strengths in
using language operators to efficiently handle file IO under the
covers), or even other basic file IO benchmarks.  Perl has always
performed better in that area, and is designed to be quick when doing
IO.

But as you can see, there have been big improvements made to Python's IO
processing speed, and once the processing of the IO happens, depending
on what is being done, these benchmarks may no longer apply.  I'd assume
Perl is still faster for regular expression stuff, for example, but
maybe not by much.  Others will know more about this than I (I last used
Perl at version 4).

I hope this helps.

-- 

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)






More information about the Python-list mailing list