Parallelization of Python on GPU?

Sturla Molden sturla.molden at gmail.com
Thu Feb 26 15:54:21 EST 2015


On 26/02/15 18:34, John Ladasky wrote:

> Hi Sturla,  I recognize your name from the scikit-learn mailing list.
>
> If you look a few posts above yours in this thread, I am aware of gpu-libsvm.  I don't know if I'm up to the task of reusing the scikit-learn wrapping code, but I am giving that option some serious thought.  It isn't clear to me that gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms.
>
> My training data sets are around 5000 vectors long.  IF that graph on the gpu-libsvm web page is any indication of what I can expect from my own data (I note that they didn't specify the GPU card they're using), I might realize a 20x increase in speed.


A GPU is a "floating point monster", not a CPU. It is not designed to 
run things like CPython. It is also only designed to run threads in 
parallel on its cores, not processes. And as you know, in Python there 
is something called GIL. Further the GPU has hard-wired fine-grained 
load scheduling for data-parallel problems (e.g. matrix multiplication 
for vertex processing in 3D graphics). It is not like a thread on a GPU 
is comparable to a thread on a CPU. It is more like a parallel work 
queue, with the kind of abstraction you find in Apple's GCD.

I don't think it really doable to make something like CPython run with 
thousands of parallel instances on a GPU. A GPU is not designed for 
that. A GPU is great if you can pass millions of floating point vectors 
as items to the work queue, with a tiny amount of computation per item. 
It would be crippled if you passed a thousand CPython interpreters and 
expect them to do a lot of work.

Also, as it is libSVM that does the math in you case, you need to get 
libSVM to run on the GPU, not CPython.

In most cases the best hardware for parallel scientific computing 
(taking economy and flexibility into account) is a Linux cluster which 
supports MPI. You can then use mpi4py or Cython to use MPI from your 
Python code.

Sturla






More information about the Python-list mailing list