[SciPy-dev] Thoughts on weave improvements

Mon Feb 11 18:03:18 EST 2002

Hey Pearu,

> First, let me say that I appreciate very much the idea of PyCOD and what
> it tries to accomplish. Nevertheless, I find that a bucket of cold water
> is in order. I hope it will be constructive ;-)

Cold water welcomed.  Just remember Faraday comment when someone asked him
"Of what use is then knowledge?" concerning his experiments an theories of
electricity.  He responded, "Of what use is a child?" Weave/pycod are way
down the list of significance, but at a comparable stage in development.
Just because implementation isn't complete or portions are slow doesn't mean
the ideas shouldn't be pursued.  weave is actually fairly complete, and
PyCOD current capabilities are jaw dropping (at least to me).  I imagine
making PyCOD general purpose is quite a bit of work, but its current code
base of 1000 lines does a whale of a lot.  I'm not familiar with it or the
issues it raises yet to know if general applicability is even feasible.
Even if it isn't, I am sure that there is a large enough sub-set of
important cases that it can/will cover to make it quite useful.

>
> Ok, I can see few issues about accelerating Python with PyCOD or by
> any other (already available) means:
>
> 1) About C++ extensions that PyCOD would generate. Why not C? Compiling
> C++ can be very time and memory consuming task when compared to C or
> Fortran compilation. For example, the gain from weave (as tested with
> weave tests) was not so remarkable on my PII,96MB laptop until I upgraded
> the memory to 160MB and C++ could run without swapping. I find this issue
> quite serious drawback also when using weave, especially when developing
> programs.

Multiple points:
1. Why not C?
The main reason is that C++ allows users to write much simpler code without
arduous error handling, very little reference count handling, and limited
knowledge of the Python API.  The combination of C++ exceptions (the biggest
win IMO) and class libraries such as CXX, SCXX, Boost, etc. with simple
syntax to access Python objects make this possible.  Someone on
comp.lang.python mentioned that weave looked like extension programming "for
the rest of us."  This is indeed its goal.

2.  C++ is to compile.
Well, it depends on what your compiling.  Templates are the problem, not C++
in general (and not all templates are expensive).  My bet is the swapping
was caused by blitz++, not the generic weave code.  Standard weave.inline
calls on my W2K, PIII 850 MHz laptop with 300+ MB and the Microsoft compiler
take about 1.5 seconds.  Functions that use blitz take 20-30 seconds.  The
1.5 second compile times could be reduced if we didn't use CXX (which uses
templates) to less than a second.  This is likely to happen with SCXX (or
some variant) its most likely replacement.  Still, these times are not
likely to be the ones people complain about.  Converting the blitz++ code
generated by weave.blitz to C is certainly doable, but not high on the
priority list.

Also, machines are getting faster all the time.  That is no excuse for
writing stupid code, but things that swap today will fit into the BIOS in a
year or two. :|   I think a weave/pycod solution will be production quality
about then, so we should be in good shape.  I know not all machines will
handle the compiles easily, but a vast majority will.

The biggest strike against C++ in my mind is the "broken compiler" problem.
If we run into many more goofy things like the exception bug in Mandrake
Linux that showed up a few weeks ago, then you start to wonder...

Final note.  It is easy to create separate backends to weave so that it
generates pure C code.  I'm happy with C++, but if someone really wants
this, they can add the capability, and I will include it in the
distribution.

3. Development time
One of the major themes of PyCOD and weave.blitz is that you can develop
entirely in Python and then "flip a magic switch" that provides large
performance improvement.

4. Performance gains
On some algorithms, that have to call back into Python, the improvement is
small.  I consider a factor of 2 the limit of useful improvement and a
factor of 10 the limit of exciting improvement.  On the laplace problem
studied by Prabhu, the improvement was about a factor of 10 over Numeric.
The weave solution was actually faster than wrapped Fortran and within 25%
of a Pure C++ approach.  On vector quantization algorithms, the speed up is
more on the order of 100.  These are both real world/useful algorithms, and
weave is relatively young (less than a year).  There are multiple things
that can improve its performance (mainly reducing calling overhead), so I
think things will only get better.

For Prabhu's notes, see here:

    http://www.scipy.org/site_content/weave/python_performance.html

What problems did you see the small improvement?  This would help us
determine what needs to be fixed.

> 2) About PyCOD compiling Python functions to extension functions. If these
> Python functions are not simple (they may use 3rd party modules, complex
> Python features like classes, etc) then in order PyCOD to be applicable in
> these (and the most practical) cases, it should be able to transform _any_
> Python code to C.

For completeness, yes.  For usefulness, no.

> (There was a project that translated python code to C
> code, anyone knows what is the state of this project?) Anyway, I find this
> task rather difficult if not possible to accomplish in a reasonable amount
> of time. Otherwise, if PyCOD cannot do that, then I am afraid that it
> will be a toy program in scipy ;)

I guess I disagree.  I can think of many times that I've handed into a
Fortran minimization library a function that just includes simple math
expressions.  These PyCOD will handle, and they are a useful subset.  As I
remember the concept for PyCOD came out of the need for calculating various
things like energy in a parallel particle physics codes.  As soon as a
physicists wrote their own functions in Python instead of using the canned
C++ functions, the code slowed down a *huge* amount.  PyCOD solve this
problem.

>
> 3) About callbacks. Indeed, when using 'some hairy, expensive python code'
> from C/C++/Fortran, most of the time is spent in Python (as Prabhu tests
> show, pure Python is approx 1000 times slower than any compiled language
> mentioned above). However, this 'some hairy, expensive python code' need
> not to be pure Python code, it may contain calls to extension modules that
> will speed up these callback functions. So, I don't think that
> calling callbacks from C/C++/Fortran would be remarkably expensive unless
> these callbacks will be really simple (true, sometimes they just are).
>
> On Fri, 8 Feb 2002, Pat Miller wrote:
>
> <snip>
>
> > It might be better if the integration routine, using its knowledge that
> > the argument to x must be a PyFloat (C++ double) could use a C++
> > accelerated function instead of slow callbacks to Python.  Not important
> > for a quick numeric integration, but crucial for using Python as a
> > true C/FORTRAN replacement.
>
> The last statement I find hard to believe. It is almost too trivial to
> write down few reasons (don't let your customers to know about these ;-):
> 1) Python (or any other scripting language) cannot never compete with
> C/FORTRAN in performance. (I truly hope that you can prove me to be
> wrong here.)

I'm confident we will come close in many (useful) cases.

> 2) There are too many high-performance and well-tested C/FORTRAN codes
> available (atlas,lapack,fftw to name just few) and to repeat their
> implementation in Python is unthinkable, in addition to the point 1).

The idea isn't to re-write these.  The SciPy approach of leveraging
netlib.org to the hilt is still in full affect.  But when people need to
customize behavior by writing there own scripts, you'd like to make it
possible for these to run as quickly as possible. Right?  If they happen to
write a function that weave/PyCOD won't accelerate, the worst thing that
will happen is that it executes at the speed of Python.

>
> Simple, transparent, and straightforward mixing of Python/C/C++/Fortran
> languages seems to be the way to go in scientific computing within Python.
>

The truth is there are relatively few people who want to go through the
learning curve of mixing languages.  Even when it is made as simple as f2py
makes it, there is a lot to know about wrapping and debugging.  Those of us
who enjoy developing libraries don't have a problem with it.  95+% of the
potential user base for Python in Science will never write an extension
module.  They'll use extensions you wrote, but they'ed rather spend there
time thinking about xxx (fill in your favorite topic here).  weave on its
own provides a means for old C/C++ hacks to add their own C with at least
somewhat less effort (and weave.blitz with a lot less effort).  PyCOD makes
it that much easier.

Like I said earlier.  I don't know where PyCOD's limits are.  We may hit a
brick wall on it at some point.  I'm fairly confident, though, that Pat can
squeeze this lemon about as hard as anyone out there.  Further, PyCOD's
current capabilities are only barely short of useful in my mind, aits
possibilities are certainly exciting enough for me to spend a week or so
making weave play nice with it.

I guess we'll have to poll you at the end of the summer to see if we've ( or
weave... :) changed your mind.  One of its major benefits (C callbacks) will
require some cooperation with f2py so that Fortran wrappers check whether
the passed in object has a C representation to call instead of automatically
calling back into Python.  We'll try and make this as easy as possible, but,
of course, you'll have to sign on as a weave believer for the integration to
work well.

weave/PyCOD aren't some silver bullet that solves every problem.  However,
they will solve many performance problems and I believe they are worth
pursuing.

thanks for your thoughtful comments,
eric