Parallelising code

Mon Sep 15 12:46:18 EDT 2008

I have some file processing code that has to deal with quite a lot of
data. I have a quad core machine, so I wondered whether I could take
advantage of some parallelism.

Essentially, I have a number of CSV files, let's say 100, each
containing about 8000 data points. For each point, I need to look up
some other data structures (generated in advance) and append the point
to a relevant list. I wondered whether I could get each core to handle
a few files each. I have a few questions:

- Am I actually going to get any speed up from parallelism, or is it
likely that most of my processing time is spent reading files? I guess
I can profile for this?

- Is list.append() thread safe? (not sure if this is the right term)
what I mean is, can two separate processors file a point in the same
list at the same time without anything horrible happening? Do I need
to do anything special (mutex or whatever) to make this happen, or
will it happen automatically?

Thanks in advance for any guidance,

Peter