Threading and consuming output from processes

Fri Feb 25 13:24:43 EST 2005

In article <mailman.3032.1109301798.22381.python-list at python.org>,
 Jack Orenstein <jao at geophile.com> wrote:
> I am developing a Python program that submits a command to each node
> of a cluster and consumes the stdout and stderr from each. I want all
> the processes to run in parallel, so I start a thread for each
> node. There could be a lot of output from a node, so I have a thread
> reading each stream, for a total of three threads per node. (I could
> probably reduce to two threads per node by having the process thread
> handle stdout or stderr.)
> 
> I've developed some code and have run into problems using the
> threading module, and have questions at various levels of detail.
> 
> 1) How should I solve this problem? I'm an experienced Java programmer
> but new to Python, so my solution looks very Java-like (hence the use of
> the threading module). Any advice on the right way to approach the
> problem in Python would be useful.
> 
> 2) How many active Python threads is it reasonable to have at one
> time? Our clusters have up to 50 nodes -- is 100-150 threads known to
> work? (I'm using Python 2.2.2 on RedHat 9.)
> 
> 3) I've run into a number of problems with the threading module. My
> program seems to work about 90% of the time. The remaining 10%, it
> looks like notify or notifyAll don't wake up waiting threads; or I
> find some other problem that makes me wonder about the stability of
> the threading module. I can post details on the problems I'm seeing,
> but I thought it would be good to get general feedback
> first. (Googling doesn't turn up any signs of trouble.)

One of my colleagues here wrote a sort of similar application
in Python, used threads, and had plenty of troubles with it.
I don't recall the details.  Some of the problems could be
specific to Python.  For example, there are some extra signal
handling issues - but this is not to say that there are no
signal handling issues with a multithreaded C application.
For my money, you just don't get robust applications when
you solve problems like multiple I/O sources by throwing
threads at them.

As I see another followup has already mentioned, the classic
"pre threads" solution to multiple I/O sources is the select(2)
function, which allows a single thread to serially process
multiple file descriptors as data becomes available on them.
When using select(), you should read from the file descriptor,
using os.read(fd, size), socketobject.recv(size) etc., to
avoid reading into local buffers as would happen with a file
object.

   Donn Cave, donn at u.washington.edu