Profiling memory usage?

Wed Feb 27 11:31:40 EST 2002

"Michael James Barber" <mjbarber at ascc.artsci.wustl.edu> wrote in message
news:a5iogb$laa at ascc.artsci.wustl.edu...
> I'm currently looking to optimize a couple of my python programs in
terms
> of memory usage.  I think I know what I need to improve, but I suspect
> that "profile before optimizing" applies to memory usage as well as to
> speed.
>
> Is there something analogous to the profile module for describing how
much
> memory is used by e.g. instances of different classes?  I didn't find
> anything using Google or in the Vaults of Parnassus.
>

I also looked for a tool and found none.  One of my projects requires a
large (1+Gb) memory footprint.  In working with a test data set (~10% of
full), I needed to balance speed vs memory.  I started with my test
requiring 272Mb and running in 2.984 seconds.  Through progressive
stages, I managed to take it down to 105Mb and 2.384 seconds.  Since
there weren't good internal profiling tools, I looked at memory usage
using pstat.exe on windows, but you could probably use top in a similar
kludgey output parsing way.  (Now someone can chime in with the *easy*
way to do this ;-)  I then added debugging output that reported memory
usage in what felt like memory consuming areas, checking befores and
afters, refactoring and restructuring, etc, running the test in between.
After six iterations or so I was at 150Mb and 2.73 seconds, and I got to
the final numbers five additional steps later.  The whole process took
about a day.

The kinds of changes I made included:
  -- not holding on to references to large structures when I was done
with them (doh)
  -- separating the values and keys previously held in dicts by
switching the values to point to indexed Numeric structures
  -- switching to a dual pass process, building the structures outside
the run environment, then reading it in later to save having the source
and prepared data in the same process.  With the full data set, the prep
process spikes usage to 1.25Gb, while the run process uses only 850Mb.
  -- forcing commonly used string parts of keys to be interned
  -- writing custom disk storage routines as opposed to using cPickle or
marshall (for both speed and memory footprint considerations)
  -- rewriting certain areas to allow for "missing" data so it didn't
need to pre-allocate memory

This is certainly not the kind of fine-grained introspection I'd like to
have access to, but it certainly worked when I found myself holding the
proverbial "5-pound bag".  ;-)

HTH,

--

Emile van Sebille
emile at fenx.com

---------