About alternatives to Matlab

Mon Dec 4 10:39:13 EST 2006

Filip Wasilewski wrote:

> So why not use C++ instead of all others if the speed really matters?
> What is your point here?

If speed matters, one should consider using hand-coded assembly.
Nothing really beats that. But it's painful and not portable. So lets
forget about that for a moment.

Contrary to what most computer scientists think, C++ is not the tool of
choice if speed really matters. Numeric computing is typically done in
Fortran 90/95.

That is not without reason. There are aspects of the C++ language that
makes certain numerical optimizations impossible, e.g. the potential
for pointer aliasing within tight loops, which hampers register
allocation. For example:

for(i=0; i<n; i++)  {
 *a += *b;
 *b++ = 0;
}

Here the C++ compiler cannot put *a in a register, as a may be aliased
by b in one of the iterations. It must therefore reload the value *a
from memory in every iteration of the loop. In Fortran this can never
happen (aliasing is disallowed in the standard), and it is therefore
much easier for the compiler to do numerical optimizations.

One should also consider using ATLAS or vendor-optimized BLAS (e.g.
Intel math kernel) for some of the axpy-operations in the wavelet
transform, unless one has an expensive Fortran compiler that takes care
of this automatically (in which case one should just use slicing and
let the compiler do the rest).

I don't agree that slicing is not the best way to approach this
problem. In both NumPy and Matlab, slicing (often) results in calls to
BLAS/ATLAS, which are heavily optimized numerical routines. One can
supply BLAS libraries that automatically exploits multiple CPUs, use
FMA (fused multiply and add) whenever appropriate, and make sure cache
is used in the most optimal way.

Vendors of Fortran compilers thrive on scientists need for speed.
Fortran compilers are therefore extremely good at taking slicing and
"vectorizing" the corresponding machine code. For those who don't know
Fortran, it has slicing operations similar to those of NumPy and
Matlab. If slicing was not the right tool for numerical work, Fortran
would not support it.  Good Fortran compilers can automatically detect
when a call to BLAS is appropriate, use FMA opcodes whenever possible,
use SIMD extensions (e.g. SSE and MMX), or use MPI automatically to
spread the work onto multiple CPUs in a cluster, etc. This is possible
because of the implicit SIMD parallelism in Fortran's array slicing. In
Fortran, array slicing can be evaluated in any order, it is a parallel
SIMD statement. SIMD is a very powerful and perhaps the preferred
paradigm for numerical parallel computing. In Fortran 90/95 a
programmer will use array slicing, and the compiler is free to
parallelize however it wants. In contrast, the MIMD paradigm often used
for parallel computing in C and C++ (e.g. multi-threading) is a lot
harder to code for the programmer and optimize for the compiler.

Not to speak of the fact that Fortran 95 is a lot easier to write than
C++. Also, those that use Fortran tend to be scientists, not
professional programmers. C++ introduces a higher level of complexity
and entire new classes of bugs; this is highly undesirable. Fortran is
a small, nice language for numerical computing. Anything else should
rather be done in Python than C++.

I would urge you all to read this:

http://latticeqcd.blogspot.com/2006/11/why-we-use-fortran-and-python.html