[py-dev] xdist and thread-safe resource counting

holger krekel holger at merlinux.eu
Fri Jan 20 09:50:18 CET 2012


Hi Eli,

interesting problem.

On Wed, Jan 18, 2012 at 20:55 -0800, Ateljevich, Eli wrote:
> I have a question about managing resources in a threadsafe way across xdist -n.
> 
> My group is using py.test as a high-level driver for testing an mpi-based numerical code. Many of our system-level tests wrap a system call to mpirun then postprocess results. I have a decorator for the tests that hints at the number of processors needed (usually something like 1,2,8).
> 
> I would like to launch as much as I can at once given the available processors. For instance, if 16 processors are available there is no reason I couldn't be doing a 12 and a 4 processor test. I was thinking of using xdist with some modest number of processors representing the maximum number of concurrent tests. The xdist test processors would launch mpi jobs when enough processors become available to satisfy the np hint for that test. This would be managed by having the tests "check out" cores and sleep if they aren't available yet.
> 
> This design requires a threadsafe method to query, acquire and lock the count of available mpi cores. I could use some sort of lock or semaphore from threading, but I thought it would be good to run this by the xdist cognoscenti and find out if there might be a preferred way of doing this given how xdist itself distributes its work or manages threads.

pytest-xdist itself does not provide or use a method to query the number
of available processors.  Quick background of xdist: Master process starts 
a number of processes which collect tests (see output of py.test --collectonly) 
and the master sees the test ids of all those collections.  It then decides 
the scheduling (Each or Load at the moment, "-n5" implies load-balancing) and 
sends test ids to the nodes to execute.  It pre-loads tests with test ids
and then waits for completion for sending more test ids to each node.
There is no node-to-node communication for co-ordination.

It might be easiest to not try to extend the xdist-mechanisms
but to implement an independent method which co-ordinates the number of running
MPI tests / used processors via a file or so.  For example, on posix you 
can get read/write a file with some meta-information and use the 
atomic os.rename operation.  Not sure about the exact semantics but
this should be doable and testable without any xdist involvement. 
If you have such a method which helps to restrict the number
of MPI-processes you can then use it from a pytest_runtest_setup which 
can read your decorator-attributes/markers and then make the decision 
if to wait or run the test.  This method also makes you rather independent
from the number of worker processes started with "-nNUM".

HTH,
holger



More information about the Pytest-dev mailing list