Large amount of files to parse/organize, tips on algorithm?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Tue Sep 2 13:06:56 EDT 2008


On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote:

> I have a bunch of files consisting of moviereviews.
> 
> For each file I construct a list of reviews and then for each new file I
> merge the reviews so that in the end have a list of reviewers and for
> each reviewer all their reviews.
> 
> What is the fastest way to do this?

Use the timeit module to find out.


> 1. Create one file with reviews, open next file an for each review see
> if the reviewer exists, then add the review else create new reviewer.
> 
> 2. create all the separate files with reviews then mergesort them?

The answer will depend on whether you have three reviews or three 
million, whether each review is twenty words or twenty thousand words, 
and whether you have to do the merging once only or over and over again.


-- 
Steven



More information about the Python-list mailing list