Using Python for processing of large datasets (convincing managment)

Anders Dahlberg sneavreet+nospam at innocent.com
Sun Jul 7 06:36:26 EDT 2002


Hello, comments inline
> > With Python, you can exploit multiple CPUs only by multi-*processing* --
> > and here, it's possible that Windows' multi-processing inefficiencies
> > may byte you (with Unix-like systems, often multiple processes or
> > multiple threads in one process have quite comparable performance).
>
> Ok, thanks.
> The actual job is easily parallelisable (is that a word? :-) in that it
> can be broken into a number (about 500) of calls to a function that
> takes one integer as input, ie.
>      calcUnit(unitnum)
> (This assumes that a database connection is available through a class or
> global variable to the function).
>
> I was planning on spawning one single-threaded XMLRPC-server per CPU per
> machine and then having a control process on one of the machines with a
> thread per process. These threads would fetch unit numbers from a Queue
> object and call the XMLRPC server using xmlrpclib.
>
> Am I correct in beliving that this would utilize all CPUs? (Windows
> issues aside).

Newbie argument:

Why not consider using jython?
Same scripting as python, better scaling to multiple cpu's - seems atleast
to me as an easier solution than xml-rpc?

(maybe it's easier to sell the idea to your boss too, due to java-hype and
all ;)

> Best Regards
> Thomas Jensen
/Anders





More information about the Python-list mailing list