[Tutor] memory management

David Ascher da@ski.org
Sun, 18 Apr 1999 13:38:24 -0700 (Pacific Daylight Time)


On Sun, 18 Apr 1999, Arne Mueller wrote:

> Hm, 100 MB memory usage is not a big problem, 200 MB is (in my case)!
> However, in I summary: I've to read in data. Whether a dataset is read
> into memory depends on what is already in the datastructure, it may
> replace an existing dataset or not, so I've to compare each new dataset
> to all existing datasets. After reading in the data, a sorting is
> applied to the structure, the sorted datastructure is written to a file.

Something which may help speed things up is to use a hash table -- instead
of comparing each dataset to each other dataset, compare the datasets'
hash values -- these will be equal if the datasets are equal -- you can
use that as a fast filter, then only really loading up the datasets which
have the same hash value.

Clearly, the datasets need to be loaded to compute their hash values --
but that can be done sequentially (unload a dataset before you load the
next), and the hash values can be stored on disk, so you only need to
compute the hash value when the datasets change.

Whether this is a useful trick depends on the specifics of your problem,
of course.

Cheers,

--David Ascher