Processing huge datasets

Terry Reedy tjreedy at udel.edu
Mon May 10 13:22:06 EDT 2004


"Anders Søndergaard" <anders.soendergaard at nokia.com> wrote in message
news:79Knc.15953$k4.322398 at news1.nokia.com...
> I'm trying to process a large filesystem (+20 million files) and keep the
> directories along with summarized information about the files (sizes,
> modification times, newest file and the like) in an instance hierarchy
> in memory. I read the information from a Berkeley Database.

> Is there a clever way of processing huge datasets in Python?
> How would a smart Python programmer advance the problem?

I would start with 2 gigs of RAM, which would allow about 90 bytes per
entry after allowing 200 megs for os and interpreter.  Even that might not
be enough.

tjr







More information about the Python-list mailing list