help! multi-threading problem on hyperthreading smp linux server

Garry Hodgson garry at research.att.com
Tue Feb 10 12:36:42 EST 2004


a colleague of mine has seen an odd problem in some code of ours. 
we initially noticed it on webware, but in distilling a test case it seems
to be strictly a python issue.  in the real system, it manifests as
webware just locking up, for no apparent reason, until we kill it.
we've also had the python interpreter running webware die on occasion.
it works fine on our desktop linux and windows machines, but fails 
on the production hardware.  the main difference being that the 
production hardware is a dual Xenon machine with
hyperthreading enabled.

has anyone run into problems like this, or have any clues what
the problem might be?  we pushed pretty hard on a skeptical 
project manager to get python and webware accepted for this
project, and it's embarrassing to have it acting flaky.

i'd really appreciate any insight anyone's got on this.  we need
to resolve it quickly.  mike's description of the test case follows.

thanks


"Michael W. Balk" wrote:

> You asked me yesterday to give you some info on the multi-threading problem
> so that you could post a query on comp.lang.python.
> 
> Here is what I know so far.
> 
> The machine has two Xenon CPUs, both with hyper-threading enabled.
> The linux kernel running is:  2.4.20-8smp
> 
> The pure python test I ran did the following:
> 
> Starts up 50 threads (using the threading module).
> Each thread calculates 500 random numbers and writes them to a file,
> repeating this 50 times to generate 50 output files.  So there are 2500
> files expected once all 50 threads complete.
> 
> The observation is that only a few threads actually produce output files,
> and none of those threads produce all 50 of the files they are to generate.
> 
> The main thread does a join on each of the 50 threads sequentially, so that
> after the last thread completes, then the program should exit.  However, the
> observation is that the main thread never exits, presumably since it has
> joined to a child thread whose run() method never returns.
> 
> Now if I change the test so that a single thread runs the test 50 times
> sequentially, all expected files are produced and the program terminates normally.


----
Garry Hodgson, Technology Consultant, AT&T Labs

Be happy for this moment.
This moment is your life.




More information about the Python-list mailing list