Multiple processes, one code

Peter Hansen peter at engcorp.com
Fri Feb 7 08:29:41 EST 2003


Scott Ransom wrote:
> 
> I am using Python to control a pulsar search code that performs
> a series of very CPU intensive operations on a list of files
> (radio data).  The code runs on dual processor nodes of a
> Beowulf cluster and the jobs are submitted through a batch
> system.  Each processor of a node executes an identical Python
> script (via the batch system) and the processes need to
> coordinate to evenly split up the files to be processed on the
> node.
> 
> My current solution for this coordination seems like an
> incredible kludge and I can't help thinking there is a better
> way.  I currently do it by creating and locking a temporary file
> something like this:

I'm not sure, but wouldn't it be fairly easy to have each script
attempt to bind to a specific socket on the machine, and if it
succeeds, that script becomes a local server for the other one,
which automatically becomes the client?

If the two maintain a socket connection, then any failure by one
(though I don't understand with try/finally around why you should
end up with leftover files as a result of a "problem job") should 
lead to the socket closing and the other side learning about it
and able to do cleanup.

-Peter




More information about the Python-list mailing list