parallel csv-file processing

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Nov 9 07:10:30 EST 2007


On Fri, 09 Nov 2007 02:51:10 -0800, Michel Albert wrote:

> Obviously this won't work as you cannot access a slice of a csv-file.
> Would it be possible to subclass the csv.reader class in a way that
> you can somewhat efficiently access a slice?

An arbitrary slice?  I guess not as all records before must have been read
because the lines are not equally long.

> The obvious way is to do the following:
> 
> buffer = []
> for line in reader:
>    buffer.append(line)
>    if len(buffer) == 1000:
>       f = job_server.submit(calc_scores, buffer)
>       buffer = []

With `itertools.islice()` this can be written as:

while True:
    buffer = list(itertools.islice(reader, 1000))
    if not buffer:
        break
    f = job_server.submit(calc_scores, buffer)



More information about the Python-list mailing list