Using Python for processing of large datasets (convincing managment)

Thomas Jensen spam at ob_scure.dk
Sat Jul 6 18:13:21 EDT 2002


Alex Martelli wrote:

[snip]

> With Python, you can exploit multiple CPUs only by multi-*processing* --
> and here, it's possible that Windows' multi-processing inefficiencies
> may byte you (with Unix-like systems, often multiple processes or
> multiple threads in one process have quite comparable performance).

Ok, thanks.
The actual job is easily parallelisable (is that a word? :-) in that it 
can be broken into a number (about 500) of calls to a function that 
takes one integer as input, ie.
     calcUnit(unitnum)
(This assumes that a database connection is available through a class or 
global variable to the function).

I was planning on spawning one single-threaded XMLRPC-server per CPU per 
machine and then having a control process on one of the machines with a 
thread per process. These threads would fetch unit numbers from a Queue 
object and call the XMLRPC server using xmlrpclib.

Am I correct in beliving that this would utilize all CPUs? (Windows 
issues aside).

-- 
Best Regards
Thomas Jensen
(remove underscore in email address to mail me)




More information about the Python-list mailing list