Parallel processing on shared data structures

Fri Mar 20 04:05:16 EDT 2009

<psaffrey at lemail.com> wrote:

> I'm filing 160 million data points into a set of bins based on their
> position. At the moment, this takes just over an hour using interval

So why do you not make four sets of bins - one for each core of your quad,
and split the points into quarters, and run four processes, and merge the
results
later?

This assumes that it is the actual filing process that is the bottle neck,
and that the bins are just sets, where position, etc does not matter.

If it takes an hour just to read the input, then nothing you can do
will make it better.

- Hendrik