[Numpy-discussion] numpy.random and multiprocessing

Bruce Southey bsouthey at gmail.com
Thu Dec 11 13:00:23 EST 2008


David Cournapeau wrote:
> Sturla Molden wrote:
>   
>> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>>
>>   
>>     
>>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
>>> this kind of error?  A simple enough solution would be to also include
>>> the process id as part of the seed 
>>>     
>>>       
>> It would not help, as the seeding is done prior to forking.
>>
>> I am mostly familiar with Windows programming. But what is needed is a 
>> fork handler (similar to a system hook in Windows jargon) that sets a 
>> new seed in the child process.
>>
>> Could pthread_atfork be used?
>>   
>>     
>
> The seed could be explicitly set in each task, no ?
>
> def task(x):
>     np.random.seed()
>     return np.random.random(x)
>
> But does this really make sense ?
>
> Is the goal to parallelize a big sampler into N tasks of M trials, to
> produce the same result as a sequential set of M*N trials ? Then it does
> sound like a trivial task at all. I know there exists libraries 
> explicitly designed for parallel random number generation - maybe this
> is where we should look, instead of using heuristics which are likely to
> be bogus, and generate wrong results.
>
> cheers,
>
> David
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>   
This is not sufficient because you can not ensure that the seed will be 
different every time task() is called.

A major part of the problem here is treating a parallel computing 
problem as a serial computing problem.  The streams must be independent 
across threads especially avoiding cross-correlation of streams (another 
gotcha) between threads.  It is up to the user to implement a 
thread-safe solution such as using a single stream that is used by all 
threads or force the different threads to start at different states. The 
only thing that Numpy could do is provide a parallel pseudo-random 
number generator.


Bruce



More information about the NumPy-Discussion mailing list