About alternatives to Matlab

Sun Dec 3 12:01:27 EST 2006

Carl Banks wrote:
>>   fill a (map3 (fun b c d -> b + c + d) b c d)
>>
>> which will be much faster because it doesn't generate an intermediate
>> array.
> 
> Ah, but, this wasn't about temporaries when you spoke of "eagerly
> allocating arrays", was it?

I had thought that all of the array operations were allocating new arrays at
first but it seems that at least assignment to a slice does not.

Does:

  a[:] = b[:] + c[:]

allocate a temporary for b[:] + c[:]?

> But yes, you're right, in this example 
> temporary arrays are created; arrays that Ocaml would not need.  (I
> don't exactly understand why you'd need a functional language to get
> this optimization, though.)

I thought that functional programming could get you both the brevity of
Python's slicing and the performance of C's compilation but I was wrong.
You can get the brevity of the Python approach in C.

> You might have guessed that you can get rid of most temporary arrays,
> at the expense of readability, with numpy.  For example, replacing
> 
> d1[:] = odd[:] - C1*even[:]
> 
> with
> 
> numpy.multiply(-C1,even,d1)
> numpy.add(d1,odd,d1)
> 
> eliminates the intermediate.  No unnecessary array allocations; no
> closures necessary.

Right. Performance will probably go from 5x slower to 2x slower because
you're traversing the arrays twice instead of once.

> I'm curious whether this shared-slicing isn't, in some ways,
> advantageous over the Ocaml way.

That's exactly what I thought to start with. The author of F# is working on
getting slicing into his language but the use of slicing in this Python
benchmark corrupted my fragile little mind. Slicing is a terrible way to
approach this problem if you're using a compiled language like F#.

I first wrote an OCaml translation of the Python and wrote my own
little "slice" implementation. I have since looked up a C++ solution and
translated that into OCaml instead:

let rec d4_aux a n =
  let n2 = n lsr 1 in
  let tmp = Array.make n 0. in
  for i=0 to n2-2 do
    tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
    tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
  done;
  tmp.(n2-1)   <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
  tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
  Array.blit tmp 0 a 0 n;
  if n > 4 then d4_aux a (n lsr 1)

let d4 a = d4_aux a (Array.length a)

Not only is that shorter than the Python, it is much faster:

0.56s C++ (direct arrays)
0.61s F# (direct arrays)
0.62s OCaml (direct arrays)
1.38s OCaml (slices)
2.38s Python (slices)
10s Mathematica 5.1

Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
a[i]).

> I presume that in Ocaml, the way 
> you'd "share" array data is to create a closure can apply some
> operation to selected elements of the array, returning the result.

Yes. That is certainly one way of doing it.

> But could this as easily modify the array in-place?

Absolutely. A closure would capture a reference to the array. Arrays are
mutable so you could alter the array from within the closure. In F# you
could even execute the closures concurrently to alter different parts of
the same array at the same time. I just tried that and it is actually
slower to multithread this.

> I presume a good 
> compiler could be able to inline the function and optimize the call to
> an in-place operation, but I'm not sure how well this works in
> practice.

Surprisingly well it seems: F# and OCaml are almost as fast as C/C++!

> Here's a version of D4_Transform that uses no temporary arrays (aside
> from the work arrays s1, d1, and d2, which are allocated only once).
> It's about 40% faster for me than the one with infix operations.  I'd
> be curious how it compares to a correct Ocaml version.  (I'd still
> expect Ocaml to be at least twice as fast.)

I get:

1.57s Python (in-place)

>> > It seems to
>> > me a big help is the ability to fold multiple array operations into a
>> > single loop, which is optimization a dynamically-typed language like
>> > Python can't easily make.  (It'd require are really smart JIT compiler
>> > or some concessions in dynamicity.)
>>
>> Writing a JIT to compile this kind of stuff is easy.
> 
> Eh, getting a JIT to do the optimization I spoke of in Python is not
> easy.  It would be relatively easy in a statically typed language.  In
> Python, it'd be tantamount to rewriting a dynamically reconfigurable
> version of numpy--you'd have to duplicate numpy's complex
> type-dispatching rules.

There aren't any complicated types in the above code. In fact, there are
only two types: float and float array. Type checking is easy in this case,
compilation to C is also easy, then you just dispatch to the compiled C
code when the types are ok. You would want to write your Python code like C
code but that is shorter in this case. You may also want to flag code for
compilation manually in order to avoid the overhead of compiling code
unnecessarily.

>> My point is that this is fundamentally bad code,
> 
> Whoa, there.  I realize that for people who prefer functional
> programming, with their recursively reductionist way of thinking, it
> might seem as if leaving any sort of reducibility in final result is
> "fundamentally bad", but that's a pretty strong thing to say.

I was referring to the slicing specifically, nothing to do with functional
programming. My F# and OCaml code are now basically identical to the C++
code.

>> so why bother trying to write a Python JIT? Why
>> not just write in a better language for this task? Optimising within a
>> fundamentally slow language seems silly to me.
> 
> Because, for most people, language choice is not a greedy maximizing of
> a single issue.  Nobody who uses numpy is under the impression that it
> can match a statically-typed language that is compiled to machine code
> in peak performance.  (Matlab is fair game, of course.)  But speed is
> not the only important thing.

In this specific context (discrete wavelet transform benchmark), I'd have
said that speed was the most important thing after correctness.

> I use Python because it's a excellently designed language that fits my
> overall needs, and numpy because sometimes I need more speed than
> vanilla Python.  Only when speed is critical (for example, 3D collision
> detection) do I write an extension in a "better language for the task"
> (i.e. C).

Have you looked at languages like F#, OCaml, Haskell, SML and so on? They
seem to offer the benefits of both worlds.

> This is something that's quite popular in the numerically-intensive
> computing community, BTW.  Many people use Python to handle boring
> stuff like file I/O, memory managment, and non-critical numerical
> calculations, and write C or Fortran extensions to do the
> numerically-intensive stuff.

Yes, I appreciate that Python is hugely popular, I just don't understand why
it is quite so popular.

I found a series of lectures on VPython, showing some amazing stuff:

http://showmedo.com/videos/series?name=pythonThompsonVPythonSeries

I just found some more interesting stuff here:

http://www.phy.syr.edu/~salgado/software/vpython/

This is really great work but I can't help but wonder why the authors chose
to use Python when other languages seem better suited. I'd like to work on
raising people's awareness of these alternatives, and probably create some
useful tools in the process. So I'm keen to learn what Python programmers
would want/expect from F# and OCaml.

What would it take to make you convert?

-- 
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists