Python IO performance?

Ganesan R rganesan at myrealbox.com
Sun Jun 1 00:30:19 EDT 2003


>>>>> "Chad" == Chad Netzer <cnetzer at mail.arc.nasa.gov> writes:

> On Sat, 2003-05-31 at 00:26, Ganesan R wrote:
>> Python is over 8 times slower! Is the problem with the fileinput
>> module or is I/O just slower with python?

> Probably a few things.  One is that this case really favors Perl because
> it has operators for doing these things, and is heavily optimized for
> such cases.

I agree that this case favors perl. It's just that I am used to writing
quick hacks like this for text processing in perl. After learning python
I've been resisting my impulse to code them in perl and use python 
instead. I always had a feeling that my python scripts ran much slower.
I decided to so time timing tests to check out my perception and hence
the post. 

> Secondly, you are timing the program startup time plus the loop.  Python
> has to compile the program before executing it (don't know how Perl does
> this, probably the same), then import a module before it does the loop. 
> This adds a fixed overhead (for small input files, the startup time
> could dominate).  Note that python doesn't create a pre-compiled
> mycat.pyc file when you run a script directly on the command line like
> this (it only does it when importing a module, or when explictly told).

I made sure that this is not a problem. Doubling the size of the file
approximately doubled the time taken. So the overhead is pretty minimal in
this case. 

> Thirdly, the fileinput module itself is not the fastest method.  Here is
> my quick hack version, that goes quite a bit faster, and uses file
> iteration directly:

> ==== - mycat2.py
> import sys

> if len( sys.argv ) < 2:
>     sys.exit()

> f = file( sys.argv[1], "r" )
> for line in f:
>     print line,
> f.close()
> ====

I noticed this myself after my post. A similar version that I wrote took
< 0.3 secs compared to over 0.7 secs for the version using fileinput. Much
better but still about 3.5 times slower than the perl version.


> Python 2.3 beta1 has improved the file iteration even more.  Here are
> those timings:

I saw some posts mentioning about 25-30% improvement in performance in
general. It's good to know that file iteration is also being addressed.

> Now my version is about 2.5 times slower than perl.  It is probably not
> the case that Python will ever catch up to Perl completely for this
> benchmark (again, this benchmark happens to play to Perl's strengths in
> using language operators to efficiently handle file IO under the
> covers), or even other basic file IO benchmarks.  Perl has always
> performed better in that area, and is designed to be quick when doing
> IO.

Interestingly when I tried strace on both the perl and python versions, the
actual system calls were virtually identical (4k reads and writes). So, I
guess the issue is with the user space libraries like Aahz suggests in his
post. I do hope fileinput performance is addressed. It's the a natural
choice for writing unix filters and a 2.5 times slow down over a direct
coded version is not acceptable. 

> But as you can see, there have been big improvements made to Python's IO
> processing speed, and once the processing of the IO happens, depending
> on what is being done, these benchmarks may no longer apply.  I'd assume
> Perl is still faster for regular expression stuff, for example, but
> maybe not by much.  Others will know more about this than I (I last used
> Perl at version 4).

Actually, I first started script using regexps. After some tests I figured
out that I/O itself seemed to be bottleneck :-(. I remember Perl using an
alternative I/O library called sfio; I don't know if that's the standard in
shipping binaries. Any way, let me do some digging with the python 2.3
sources. May be there's more scope for improvement.

Ganesan





More information about the Python-list mailing list