Shared memory python between two separate shell-launched processes

Charles Fox (Sheffield) charles.fox at gmail.com
Thu Feb 10 12:21:44 EST 2011


On Feb 10, 3:43 pm, Jean-Paul Calderone <calderone.jeanp... at gmail.com>
wrote:
> On Feb 10, 9:30 am, "Charles Fox (Sheffield)" <charles.... at gmail.com>
> wrote:
>
> > Hi guys,
> > I'm working on debugging a large python simulation which begins by
> > preloading a huge cache of data.  I want to step through code on many
> > runs to do the debugging.   Problem is that it takes 20 seconds to
> > load the cache at each launch.  (Cache is a dict in a 200Mb cPickle
> > binary file).
>
> > So speed up the compile-test cycle I'm thinking about running a
> > completely separate process (not a fork, but a processed launched form
> > a different terminal)
>
> Why _not_ fork?  Load up your data, then go into a loop forking and
> loading/
> running the rest of your code in the child.  This should be really
> easy to
> implement compared to doing something with shared memory, and solves
> the
> problem you're trying to solve of long startup time just as well.  It
> also
> protects you from possible bugs where the data gets corrupted by the
> code
> that operates on it, since there's only one copy shared amongst all
> your
> tests.  Is there some other benefit that the shared memory approach
> gives
> you?
>
> Of course, adding unit tests that exercise your code on a smaller data
> set
> might also be a way to speed up development.
>
> Jean-Paul



Thanks Jean-Paul, I'll have a think about this.  I'm not sure if it
will get me exactly what I want though, as I would need to keep
unloading my development module and reloading it, all within the
forked process, and I don't see how my debugger (and emacs pdb
tracking) will keep up with that to let me step though the code.
(this debugging is more about integration issues than single
functions, I have a bunch of unit tests for the little bits but
something is unhappy when I put them all together...)

(I also had a reply by email, suggesting I use /dev/shm to store the
data instead of the hard disc; this speeds things up a little but not
much as the data still has to be transferred in bulk into my
process.   Unless I'm missing something and my process can just access
the data in that shm without having to load its own copy?)



More information about the Python-list mailing list