parallel csv-file processing

Paul Boddie paul at boddie.org.uk
Fri Nov 9 07:48:42 EST 2007


On 9 Nov, 12:02, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
> Why not pass the disk offsets to the job server (untested):
>
>    n = 1000
>    for i,_ in enumerate(reader):
>      if i % n == 0:
>        job_server.submit(calc_scores, reader.tell(), n)
>
> the remote process seeks to the appropriate place and processes n lines
> starting from there.

This is similar to a lot of the smarter solutions for Tim Bray's "Wide
Finder" - a problem apparently in the same domain. See here for more
details:

http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder

Lots of discussion about more than just parallel processing/
programming, too.

Paul




More information about the Python-list mailing list