Wow, Python much faster than MatLab

Sat Dec 30 22:51:22 EST 2006

Stef Mientki wrote:

> MatLab: 14 msec
> Python:  2 msec

I have the same experience. NumPy is usually faster than Matlab. But it
very much depends on how the code is structured.

I wonder if it is possible to improve the performance of NumPy by
having its fundamental types in the language, instead of depending on
operator overloading. For example, in NumPy, a statement like

array3[:] = array1[:] + array2[:]

allocates an intermediate array that is not needed. This is because the
operator overloading cannot know if it's evaluating a part of a larger
statement like

array1[:] = (array1[:] + array2[:]) * (array3[:] + array4[:])

If arrays had been a part of the language, as it is in Matlab and
Fortran 95, the compiler could see this and avoid intermediate storage,
as well as looping over the data only once. This is one of the main
reasons why Fortran is better than C++ for scientific computing. I.e.
instead of

for (i=0; i<n; i++)
  array1[i] = (array1[i] + array2[i]) * (array3[i] + array4[i]);

one actually gets something like three intermediates and four loops:

tmp1 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
   tmp1[i] = array1[i] + array2[i];
tmp2 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
   tmp2[i] = array3[i] + array4[i];
tmp3 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
   tmp3[i] = tmp1[i] + tmp2[i];
free(tmp1);
free(tmp2);
for (i=0; i<n; i++)
  array1[i]  = tmp3[i];
free(tmp3);

In C++ this is actually further bloated by constructor, destructor and
copyconstructor calls.
Why one should use Fortran over C++ is obvious. But it also applies to
NumPy, and also to the issue of Numpy vs. Matlab, as Matlab know about
arrays and has a compiler that can deal with this, whilst NumPy depends
on bloated operator overloading. On the other hand, Matlab is
fundamentally impaired on function calls and array slicing compared
with NumPy (basically copies are created instead of views). Thus, which
is faster - Matlab or NumPy - very much depends on how the code is
written.

Now for my question: operator overloading is (as shown) not the
solution to efficient scientific computing. It creates serious bloat
where it is undesired. Can NumPy's performance be improved by adding
the array types to the Python language it self? Or are the dynamic
nature of Python preventing this?

Sturla Molden