Python IO performance?

Rob Hall robhall at ii.net
Mon Jun 2 12:31:12 EDT 2003


I have a Python program that iterates through a HUGE amount of data files.
Bassically, it performs calculations on a large data set, and it spits out
another data set just as big.

On my P3 it takes about 6 days to complete at a rate of around 12500 sets
per second, peaking at around 14000 sets/sec and a minimum of around 5000
sets/sec .  Obviously, it would be nice to speed this up a bit.

I have optimised the calculations, but the real performance bottle-neck is
IO.

I converted the program into Ada - Ada being very fast.  But there were no
real performance gains here.

When I read your post I decided to give it a go with perl.  It was a bit
tedious, as I haven't touched Perl for about 5 years!  I remembered why I
don't enjoy it - I find Python fun, but Perl just seemed like work!  But the
results were nothing short of excellent!!!

I have been running the new Perl algorithm for about 1/2 hr now and started
with around 18500 sets/sec!  However, over this time the rate has dropped to
about 17200, with a minimum of around 15300.

I'm currently downloading perl 5.8 to see if is any faster in this respec.

The other thing that has been nice is that the Perl script is a lot
friendlier to my other processes.  The Python equivalent tends to be a
resource hog and slows down my system terribly (its my desktop, and I must
use it for other things when the script is running).

All in all, I'm very happy with my perl script, but I would have had some
trouble writing it if I did not already have my python 'pseudocode' to work
from.  Yes, I agree that Python definitely needs some work done on IO.

Rob






More information about the Python-list mailing list