Multiple processes, one code
Peter Hansen
peter at engcorp.com
Fri Feb 7 08:29:41 EST 2003
Scott Ransom wrote:
>
> I am using Python to control a pulsar search code that performs
> a series of very CPU intensive operations on a list of files
> (radio data). The code runs on dual processor nodes of a
> Beowulf cluster and the jobs are submitted through a batch
> system. Each processor of a node executes an identical Python
> script (via the batch system) and the processes need to
> coordinate to evenly split up the files to be processed on the
> node.
>
> My current solution for this coordination seems like an
> incredible kludge and I can't help thinking there is a better
> way. I currently do it by creating and locking a temporary file
> something like this:
I'm not sure, but wouldn't it be fairly easy to have each script
attempt to bind to a specific socket on the machine, and if it
succeeds, that script becomes a local server for the other one,
which automatically becomes the client?
If the two maintain a socket connection, then any failure by one
(though I don't understand with try/finally around why you should
end up with leftover files as a result of a "problem job") should
lead to the socket closing and the other side learning about it
and able to do cleanup.
-Peter
More information about the Python-list
mailing list