Multi thread reading a file

Stefan Behnel stefan_ml at behnel.de
Wed Jul 1 01:30:13 EDT 2009


Gabriel Genellina wrote:
> En Tue, 30 Jun 2009 22:52:18 -0300, Mag Gam <magawake at gmail.com> escribió:
> 
>> I am very new to python and I am in the process of loading a very
>> large compressed csv file into another format.  I was wondering if I
>> can do this in a multi thread approach.
> 
> Does the format conversion involve a significant processing time? If
> not, the total time is dominated by the I/O time (reading and writing
> the file) so it's doubtful you gain anything from multiple threads.

Well, the OP didn't say anything about multiple processors, so multiple
threads may not help wrt. processing time. However, if the file is large
and the OS can schedule the I/O in a way that a seek disaster is avoided
(although that's hard to assure with today's hard disk storage density, but
SSDs may benefit), multiple threads reading multiple partial streams may
still reduce the overall runtime due to increased I/O throughput.

That said, the OP was mentioning that the data was compressed, so I doubt
that the I/O bandwidth is a problem here. As another poster put it: why
bother? Run a few benchmarks first to see where (and if!) things really get
slow, and then check what to do about the real problem.

Stefan



More information about the Python-list mailing list