Processing large CSV files - how to maximise throughput?

Dave Angel davea at davea.name
Thu Oct 24 23:57:17 EDT 2013


On 24/10/2013 23:35, Steven D'Aprano wrote:

> On Fri, 25 Oct 2013 02:10:07 +0000, Dave Angel wrote:
>
>>> If I have multiple large CSV files to deal with, and I'm on a
>>> multi-core machine, is there anything else I can do to boost
>>> throughput?
>> 
>> Start multiple processes.  For what you're doing, there's probably no
>> point in multithreading.
>
> Since the bottleneck will probably be I/O, reading and writing data from 
> files, I expect threading actually may help.
>
>
>

We approach the tradeoff from opposite sides.  I would use
multiprocessing to utilize multiple cores unless the communication costs
(between the processes) would get too high.

They won't in this case.

But I would concur -- probably they'll both give about the same speedup.
I just detest the pain that multithreading can bring, and tend to avoid
it if at all possible.

-- 
DaveA





More information about the Python-list mailing list