Python speed vs csharp

Thu Jul 31 02:09:22 EDT 2003

Bear with me: this post is moderately long, but I hope it is relatively
succinct. 

I've been using Python for several years as a behavioral modeling tool for
the circuits I design. So far, it's been a good trade-off: compiled C++
would run faster, but the development time of Python is so much faster, and
the resulting code is so much more reliable after the first pass, that I've
never been tempted to return to C++. Every time I think stupid thoughts
like, "I'll bet I could do this in C++," I get out my copy of Scott Meyers'
"Effecive C++," and I'm quickly reminded why it's better to stick with
Python (Meyers is a very good author, but points out lots of quirks and
pitfalls with C++ that I keep thinking that I shouldn't have to worry
about, much less try to remember). Even though Python is wonderful in that
regard, there are problems.

Here's the chunk of code that I'm spending most of my time executing:

# Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
# Fifth order approximation. |error| <= 1.5e-7 for all x
#
def erfc( x ):
   p  =  0.3275911
   a1 =  0.254829592
   a2 = -0.284496736
   a3 =  1.421413741
   a4 = -1.453152027
   a5 =  1.061405429

   t = 1.0 / (1.0 + p*float(x))
   erfcx = ( (a1 + (a2 + (a3 +
             (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
   return erfcx

This is an error function approximation, which gets called around 1.5
billion times during the simulation, and takes around 3500 seconds (just
under an hour) to complete. While trying to speed things up, I created a
simple test case with the code above and a main function to call it 10
million times. The code takes roughly 210 seconds to run. 

The current execution time is acceptable, but I need to increase the
complexity of the simulation, and will need to increase the number of data
points by around 20X, to roughly 30 billion. This will increase the
simulation time to over a day. Since the test case code was fairly small, I
translated it to C and ran it. The C code runs in approximately 7.5
seconds. That's compelling, but C isn't: part of my simulation includes a
parser to read an input file. I put that together in a few minutes in
Python, but there are no corresponding string or regex libraries with my C
compiler, so converting my Python code would take far more time than I'd
save during the resulting simulations.

On a lark, I grabbed the Mono C# compiler, and converted my test case to
C#. Here's the corresponding erfc code: 

   public static double erfc( double x )
   {
      double p, a1, a2, a3, a4, a5;
      double t, erfcx;

      p  =  0.3275911;
      a1 =  0.254829592;
      a2 = -0.284496736;
      a3 =  1.421413741;
      a4 = -1.453152027;
      a5 =  1.061405429;

      t = 1.0 / (1.0 + p*x);
      erfcx = ( (a1 + (a2 + (a3 +
                (a4 + a5*t)*t)*t)*t)*t ) * Math.Exp(-Math.Pow(x,2.0));
      return erfcx;
   }

Surprisingly (to me, at least), this code executed 10 million iterations in
8.5 seconds - only slightly slower than the compiled C code. 

My first question is, why is the Python code, at 210 seconds, so much
slower?

My second question is, is there anything that can be done to get Python's
speed close to the speed of C#?

-- Mike --