Threading and consuming output from processes

Jack Orenstein jao at geophile.com
Thu Feb 24 22:18:37 EST 2005


I am developing a Python program that submits a command to each node
of a cluster and consumes the stdout and stderr from each. I want all
the processes to run in parallel, so I start a thread for each
node. There could be a lot of output from a node, so I have a thread
reading each stream, for a total of three threads per node. (I could
probably reduce to two threads per node by having the process thread
handle stdout or stderr.)

I've developed some code and have run into problems using the
threading module, and have questions at various levels of detail.

1) How should I solve this problem? I'm an experienced Java programmer
but new to Python, so my solution looks very Java-like (hence the use of
the threading module). Any advice on the right way to approach the
problem in Python would be useful.

2) How many active Python threads is it reasonable to have at one
time? Our clusters have up to 50 nodes -- is 100-150 threads known to
work? (I'm using Python 2.2.2 on RedHat 9.)

3) I've run into a number of problems with the threading module. My
program seems to work about 90% of the time. The remaining 10%, it
looks like notify or notifyAll don't wake up waiting threads; or I
find some other problem that makes me wonder about the stability of
the threading module. I can post details on the problems I'm seeing,
but I thought it would be good to get general feedback
first. (Googling doesn't turn up any signs of trouble.)

Thanks.

Jack Orenstein




More information about the Python-list mailing list