Parallelization of Python on GPU?

Wed Feb 25 21:35:18 EST 2015

I've been working with machine learning for a while.  Many of the standard packages (e.g., scikit-learn) have fitting algorithms which run in single threads.  These algorithms are not themselves parallelized.  Perhaps, due to their unique mathematical requirements, they cannot be paralleized.  

When one is investigating several potential models of one's data with various settings for free parameters, it is still sometimes possible to speed things up.  On a modern machine, one can use Python's multiprocessing.Pool to run separate instances of scikit-learn fits.  I am currently using ten of the twelve 3.3 GHz CPU cores on my machine to do just that.  And I can still browse the web with no observable lag.  :^)

Still, I'm waiting hours for jobs to finish.  Support vector regression fitting is hard.

What I would REALLY like to do is to take advantage of my GPU.  My NVidia graphics card has 1152 cores and a 1.0 GHz clock.  I wouldn't mind borrowing a few hundred of those GPU cores at a time, and see what they can do.  In theory, I calculate that I can speed up the job by another five-fold.

The trick is that each process would need to run some PYTHON code, not CUDA or OpenCL.  The child process code isn't particularly fancy.  (I should, for example, be able to switch that portion of my code to static typing.)

What is the most effective way to accomplish this task?

I came across a reference to a package called "Urutu" which may be what I need, however it doesn't look like it is widely supported.

I would love it if the Python developers themselves added the ability to spawn GPU processes to the Multiprocessing module!

Thanks for any advice and comments.