Performance question about math operations

Tue Jul 2 18:41:29 EDT 2002

Folks,

Please give me a *little* credit.  I am aware that Python doesn't do
constant folding, etc.  In the real code, those constants are variables.
However, my point was that the floating point operations are sufficiently
slow *even with constants* that they dominate the variable references.

What I was hoping to provoke was a discussion about hoisting integer and
floating point dominated paths into a Python optimization.  Or possibly
looking at the efficiency of the dispatch table for FP and integer
dominated paths.

Just to note, Perl is somewhere between 2-4 times faster than Python for
the same code.  That speedup moves the coordinate math from being the main
bottleneck, to just one of many that are comparable.

Furthermore, the C++ code uses actual variables, function calls, and
doesn't have any optimizations enabled.  In short, it is the *worst*
possible performance C++ can deliver.  I have included the C++ code for
your edification.  I *have* looked at the assembly.  It really does all
the calculations.

Jonathan Hogg <jonathan at onegoodidea.com> wrote:
> Could you post the actual code, or is it too long/proprietary? How are
> all the magic numbers arrived at? Does it really look as simple as you
> suggested?

The code is too long, but it really is *that* simple.  Profiling has shown
that the coordinate conversion is *the* bottleneck.  Without removing it,
no other optimization will have any real effect (Amdahl's Law).  All of
those constants are actually variables in the real code, but with the
underlying math that slow, the variable references don't matter.

And, yes, I have disassembled the byte code looking for slow instructions.
The problem really appears to be fundamentally Python.

Terry Reedy <tjreedy at udel.edu> wrote:
> Use numerical Python extensions which works with arrays of unboxed
> floats and which only has to interprete +-*/ once each for entire

Well, I tried that.  I have used both Numeric and SciPy for a circuit
simulation engine and it works well; it fails for a layout editor.
Special data structures are required to handle the 2-D region querying
which is the hallmark of a VLSI layout editor.  (10,000,000 polygons -
10^4 difference in coordinate values (.1 um to 1cm))

I have to query the RTree for data(O(logn + k)), unpack all of the
coordinates (O(k)), duplicate them in an array (O(k) time and space,
wasted), keep track of their references (O(k) in time and space, wasted),
apply the transform(O(k)), rip the values back out so they can be rendered
as polygons(O(k)), and finally throw the array out.  The overhead of two
wasted streams of time and space of O(k) moving things around in Python
swamps the slow math.

Thanks for all the help.  I appreciate all the different points of view.
-a

-------------- next part --------------

#include <iostream>

class DoublePair {
 public:
  double p0;
  double p1;
};

class ScreenCalc {
 public:
  double x_origin;
  double y_origin;
  double zoomfactor;

  ScreenCalc() : x_origin(678), y_origin(456), zoomfactor(3.589) {};

  class DoublePair c2s(double p0, double p1);
};

class DoublePair ScreenCalc::c2s(double p0, double p1)
{
  DoublePair dp;

  dp.p0 = (p0 - x_origin)*zoomfactor;
  dp.p1 = -((p1 - y_origin)*zoomfactor);

  return dp;
}

int main(void)
{
  ScreenCalc sc;
  DoublePair dp,dp2;

  double p0 = 1;
  double p1 = 1;

  for(int i = 0; i < 1000000; i++)
    {
      DoublePair dp = sc.c2s(p0, p1);
    }

  dp2 = dp;
}