[SciPy-user] python (against java) advocacy for scientific projects

David Cournapeau cournape at gmail.com
Tue Jan 20 13:13:27 EST 2009


On Tue, Jan 20, 2009 at 11:27 PM, Ravi <lists_ravi at lavabit.com> wrote:

>
> Really? Try writing just a fixed-point radix-8 FFT which handles complex input
> vectors up to, say, 64K in length with flexible rounding/clipping strategies
> with Fortran/python. I bet one could not write one that is even half as
> maintainable and half the performance of the C++ version. Or, for that matter,
> try writing something like Macaulay2 (or any nontrivial group-theoretic
> algorithms) on Fortran/python.

The FFT reference is FFTW. It uses neither C++ or fortran. It does not
have rounding /clipping strategies that I know of, but is certainly as
flexible as you can make in C++. Multiple sizes and dimensions,
multiple strategies and architectures.

>
> Code maintainability works by using clearly defined idioms.

That's really only part of the story. Code maintainability also
requires the idioms to be well shared and understood by the community
- which C++ makes really hard to ensure because it is such a complex
beast. C++ is unmaintainable without a strong set of coding rules,
which only really works in companies, or when you have an already
strong framework (in open source, it is quite striking that C++ is
seldom used, except for complex GUI programs).

I have no reason to doubt your experience that template leads to
maintainable code - but it is exactly the contrary in my experience,
and often for code which is supposed to be state of the art (boost).

>
> First, Fortran, as I pointed out above, is generally worthless for a lot of
> computation-intensive problems that don't map to its native data types.
>
> Second, Fortran is not magic; it simply uses optimized libraries underneath
> and the speed of Fortran compiled code depends upon the libraries

Part of the fortran speed comes from the fact that fortran does not
have pointer. Pointers cause huge problems for optimization. And
meta-programming as done in C++ is nothing new; there are similar
schemes with much better syntax, and much more powerful in more high
level language - for example scheme + staline, ocaml + code generator,
faust for real time signal processing, etc... C++ templates are to
those systems what punch card is to python.

> but you can
> beat those libraries from C++ (because template metaprogramming can be used to
> provide more information to the compiler), e.g., see
>  http://eigen.tuxfamily.org/index.php?title=Benchmark

I think something like eigen will not suit python developers much.
First, it has dreadful compilation time (like everything
template-based), and their performance numbers, I never could
reproduce them. I have never seen such a difference between MKL and
ATLAS as shown on their benchmark - since they don't give enough
information, it is hard to tell which atlas they used, but in my
experience, ATLAS (and of course MKL) was always much faster than
eigen, on both mac os X (with accelerate, which is mostly customized
atlas, at least at its code) and Linux, with the benchmark they
provide. At this point, I don't understand what they are measuring.

I also note that they are so much faster than blitz, which itself was
supposed to match fortran speed. This puzzles me as a fundamental
contradiction somewhere :)

>
> Third, computation speed now on CotS processors depends more on cache & memory
> access optimization than anything else, which compilers can do with C/C++ just
> as well as with Fortran;

No, they can't. At least in standard C++, you can't provide enough
informations about pointers. But even then, it is often only 2 or 3
times slower - which rarely matters for scientific programming, except
for the biggest simulations. That's something that many C++ developers
don't seem to understand for some reason; I remember that one eigen
developer asked me once whether I would prefer coding in 3 days
something which runs in 3 hours or running in 3 days something which
took 3 hours to program - we both had an obvious answer to this
question, and you can guess it was not the same for both of us.

For real time programming (for signal processing kind of stuff for
example), this may matter, and indeed, C++ may be the best available
tool for this - it is certainly the de facto language for "real time"
music softwares, for example.

> You could object that writing
> such a C++ library is difficult, but the point is that Eigen or MTL needs to
> be written only once (just as you would write only once the Fortran compiler
> where this knowledge is embedded for Fortran).

But the point is that it is difficult for no reason but a dreadful
syntax. Something like eigen could be done in a higher level language.
To everyone his own interet, I guess, but I don't understand the joy
of spending time coding and debugging template code. It is just awful
- the compiler often cannot tell you even the line which has a syntax
error.

Something like fftw, wich a code generator written in a high level
language is a much better example of meta programming IMHO. It is
readable, flexible, and portable, at least in comparison to anything
C++ has to offer today.

David



More information about the SciPy-User mailing list