Large amount of files to parse/organize, tips on algorithm?

cnb circularfunc at yahoo.se
Tue Sep 2 14:02:03 EDT 2008


On Sep 2, 7:06 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote:
> > I have a bunch of files consisting of moviereviews.
>
> > For each file I construct a list of reviews and then for each new file I
> > merge the reviews so that in the end have a list of reviewers and for
> > each reviewer all their reviews.
>
> > What is the fastest way to do this?
>
> Use the timeit module to find out.
>
> > 1. Create one file with reviews, open next file an for each review see
> > if the reviewer exists, then add the review else create new reviewer.
>
> > 2. create all the separate files with reviews then mergesort them?
>
> The answer will depend on whether you have three reviews or three
> million, whether each review is twenty words or twenty thousand words,
> and whether you have to do the merging once only or over and over again.
>
> --
> Steven



I merge once. each review has 3 fields, date rating customerid. in
total ill be parsing between 10K and 100K, eventually 450K reviews.



More information about the Python-list mailing list