About alternatives to Matlab

Tue Dec 5 07:36:38 EST 2006

Filip Wasilewski wrote:
> Besides of that this code is irrelevant to the original one and your
> further conclusions may not be perfectly correct. Please learn first
> about the topic of your benchmark and different variants of wavelet
> transform, namely difference between lifting scheme and dwt, and try
> posting some relevant examples or use other topic for your benchmarks.

Your lifting implementation is slightly longer and slower than the
non-lifting one but its characteristics are identical. For n=2^25 I get:

1.88s C++ (816 non-whitespace bytes)
2.00s OCaml (741 b)
2.33s F# (741 b)
9.83s Your Python (1,002 b)

The original python was 789 bytes.

> Neglecting this issues, I wonder if it is equally easy to juggle
> arbitrary multidimensional arrays in a statically typed language and do
> operations on selected axis directly from the interactive interpreter
> like in the NumPy example from my other post -
> http://groups.google.com/group/comp.lang.python/msg/a71a5bf00a88ad57 ?
> I don't know much OCaml so it would be great if you could elaborate on
> this.

It is probably just as easy. Instead of dynamic typing you have parametric
polymorphism. If you want to make your function generic over arithmetic
type then you can pass in the arithmetic operators.

>> 0.56s C++ (direct arrays)
>> 0.61s F# (direct arrays)
>> 0.62s OCaml (direct arrays)
>> 1.38s OCaml (slices)
>> 2.38s Python (slices)
>> 10s Mathematica 5.1
>>
>> Note that all implementations are safe (e.g. C++ uses a.at(i) instead of
>> a[i]).
> 
> So why not use C++ instead of all others if the speed really matters?
> What is your point here?

1. Benchmarks should not just include two inappropriate languages.
2. Speed aside, the other languages are more concise.

> Could you please share full benchmark code, so we could make
> conclusions too?

I'll paste the whole programs at the end...

> I would like to get to know about your benchmark 
> methodology. I wonder if you are allocating the input data really
> dynamically or whether it's size is a priori knowledge for the
> compiler.

The knowledge is there for the compiler to use but I don't believe any of
them exploit it.

>> In this specific context (discrete wavelet transform benchmark), I'd have
>> said that speed was the most important thing after correctness.
> 
> Not necessarily, if you are doing research with different kinds of
> wavelets and need a general, flexible and easily adaptable tool or just
> the computation speed is not considered a bottleneck.

Brevity is probably next.

> Language speed is a great advantage, but if it always was the most
> important, people would talk directly to the CPUs or knit DSP chips in
> theirs backyards whenever they need to solve a calculation problem.

Sure.

> Please don't make this discussion heading towards `what is the fastest
> way of doing d4 transform and why OCaml rules` instead of `what is the
> optimal way of doing things under given circumstances and still have a
> free afternoon ;-)`.

Comments like that seem to be based on the fundamental misconception that
writing C++ like this is hard.

> Different tasks need different, well-balanced 
> measures. Neither Python nor OCalm nor any other language is a panacea
> for every single problem.

Absolutely. OCaml is as good as the next (compiled) language in this case.
Python and Matlab seem to be particularly bad at this task.

Here's my complete OCaml:

let a = sqrt 3. and b = 4. *. sqrt 2.
let h0, h1, h2, h3 =
  (1. +. a) /. b, (3. +. a) /. b, (3. -. a) /. b, (1. -. a) /. b
let g0, g1, g2, g3 = h3, -.h2, h1, -.h0

let rec d4_aux tmp a n =
  let n2 = n lsr 1 in
  for i=0 to n2-2 do
    tmp.(i) <- a.(i*2)*.h0+.a.(i*2+1)*.h1+.a.(i*2+2)*.h2+.a.(i*2+3)*.h3;
    tmp.(i+n2) <- a.(i*2)*.g0+.a.(i*2+1)*.g1+.a.(i*2+2)*.g2+.a.(i*2+3)*.g3;
  done;
  tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
  tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
  Array.blit tmp 0 a 0 n;
  if n > 4 then d4_aux tmp a (n lsr 1)

let d4 a =
  let tmp = Array.make (Array.length a) 0. in
  d4_aux tmp a (Array.length a)

let () =
  print_endline "Generating test data...";
  let x = Array.init (1 lsl 25) (fun _ -> Random.float 1.) in
  print_endline "Benchmarking...";
  let t1 = Sys.time() in
  ignore(d4 x);
  Printf.printf "Elapsed time is %.6f seconds\n" (Sys.time() -. t1)

and C++:

#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>

using namespace std;

double a = sqrt(3), b = 4 * sqrt(2);
double h0 = (1 + a) / b, h1 = (3 + a) / b, h2 = (3 - a) / b, h3 = (1 - a) /
b;
double g0 = h3, g1 = -h2, g2 = h1, g3 = -h0;

void d4(vector<double> &a) {
  vector<double> tmp(a.size());
  for (int n = a.size(); n >= 4; n >>= 1) {
    int half = n >> 1;

    for (int i=0; i<n-2; i+=2) {
      tmp.at(i/2) = a.at(i)*h0 + a.at(i+1)*h1 + a.at(i+2)*h2 + a.at(i+3)*h3;
      tmp.at(i/2+half) = a.at(i)*h3 - a.at(i+1)*h2 + a.at(i+2)*h1 -
a.at(i+3)*h0
;
    }

    tmp.at(half-1) = a.at(n-2)*h0 + a.at(n-1)*h1 + a.at(0)*h2 + a.at(1)*h3;
    tmp.at(2*half-1) = a.at(n-2)*h3 - a.at(n-1)*h2 + a.at(0)*h1 -
a.at(1)*h0;

    copy(tmp.begin(), tmp.begin() + n, a.begin());
  }
}

int main() {
  cout << "Generating data...\n";
  vector<double> a(1 << 25);
  for (int i=0; i<a.size(); ++i) a.at(i) = rand();
  cout << "Benchmarking...\n";
  double t1 = clock();
  d4(a);
  cout << "Elapsed time " << (clock() - t1) / CLOCKS_PER_SEC << "s\n";
}

-- 
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet