Large amount of files to parse/organize, tips on algorithm?

Paul Rubin http
Tue Sep 2 14:28:39 EDT 2008


cnb <circularfunc at yahoo.se> writes:
> For each file I construct a list of reviews and then for each new file
> I merge the reviews so that in the end have a list of reviewers and
> for each reviewer all their reviews.
> 
> What is the fastest way to do this?

Scan through all the files sequentially, emitting records like

(movie, reviewer, review)

Then use an external sort utility to sort/merge that output file
on each of the 3 columns.  Beats writing code.



More information about the Python-list mailing list