Processing huge datasets

Mon May 10 13:22:06 EDT 2004

"Anders SÃ¸ndergaard" <anders.soendergaard at nokia.com> wrote in message
news:79Knc.15953$k4.322398 at news1.nokia.com...
> I'm trying to process a large filesystem (+20 million files) and keep the
> directories along with summarized information about the files (sizes,
> modification times, newest file and the like) in an instance hierarchy
> in memory. I read the information from a Berkeley Database.

> Is there a clever way of processing huge datasets in Python?
> How would a smart Python programmer advance the problem?

I would start with 2 gigs of RAM, which would allow about 90 bytes per
entry after allowing 200 megs for os and interpreter.  Even that might not
be enough.

tjr