advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager

Matt Newville newville at cars.uchicago.edu
Mon Mar 24 16:24:34 EDT 2014


I'm maintaining a python interface to a C library for a distributed
control system (EPICS, sort of a SCADA system) that does a large
amount of relatively light-weight network I/O.   In order to keep many
connections open and responsive, and to provide a simple interface,
the python library keeps a global store of connection state.

This works well for single processes and threads, but not so well for
multiprocessing, where the global state causes trouble.  The issue is
not too difficult to work around (ie, completely clear any such global
cache and insist that new connections be established in each process),
but easy to forget.

To make this easier, I have a function to clear the global cache,

def clear_my_cache():
     # empty global variables caching network connections
     return

and then subclass of multiprocessing.Process like:

class MyProcess(multiprocessing.Process):
    def __init__(self, **kws):
        multiprocessing.Process.__init__(self, **kws)

    def run(self):
        clear_my_cache()
        mp.Process.run(self)

This works fine.  I can subclass multiprocessing.pool.Pool too, as it
uses Process as a class variable (removing doc strings):

class Pool(object):
    Process = Process

    def __init__(self, processes=None, initializer=None, initargs=(),
                 maxtasksperchild=None):

and then uses self.Process in its Pool._repopulate_pool().  That makes
subclassing Pool is as easy as (removing doc strings):

class MyPool(multiprocssing.pool.Pool):
    def __init__(self, **kws):
        self.Process = MyProcess
        mp_pool.Pool.__init__(self, **kws)

I'm very pleased to need so little code here!  But, I run into trouble
when I try to subclass any of the Managers().  It looks like I would
have to make a nearly-identical copy of ~30 lines of
BaseManager.start() as it calls multiprocessing.Process() to create
processes there.   In addition, it looks like subclassing
multiprocessing.managers.SyncManager would mean making a
near-identical copy of a similar amount of code.

I'd be willing to do this, but it seems like a bad idea  -- I much
prefer overwriting self.Process as for Pool.

Does anyone have any advice for the best approach here?  Should, like
Pool,  BaseManager also use a class variable (Process = Process)?

Thanks in advance for any advice.

--Matt



More information about the Python-list mailing list