Reclaiming (lots of) memory

Nick Craig-Wood nick at craig-wood.com
Mon Oct 4 09:29:58 EDT 2004


Paul Rubin <> wrote:
>  Nick Craig-Wood <nick at craig-wood.com> writes:
> > Just for fun I converted your program to use anydbm.  It then took 135
> > seconds to run (vs 12 for your original code), but only used 4 MB of
> > memory top.  The values of the dbm are pickled hashes.  The keys could
> > be too, but I decided to just use a tab seperated string...
> 
>  But I think it used much more than 4 MB of memory if you count the
>  amount of system cache that held that dbm file.

The dbm file is only 10 MB though, so thats 10MB of memory extra used
which *is* reclaimable!

>  Without the caching there'd have had to be real disk seeks for all
>  those dbm updates.  So I think if you had a lot more data, enough
>  that you couldn't keep the dbm in cache any more, you'd get a
>  horrible slowdown.

You've only got to keep the index of the dbm in cache and its designed
for that purpose.

You do get lots of disk io when making the dbm though - I suspect it
does a lot of fsync() - you may be able to turn that off.

>  That's why you may be better off with a sorting-based method, that
>  only needs sequential disk operations and not a lot of random
>  seeks.

A sorting based method will be better for huge datasets that are only
compiled once.  I would have thought a dbm would win for lots of
updates and medium sized datasets.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list