[pypy-dev] An idea about automatic parallelization in PyPy/RPython

haael at interia.pl haael at interia.pl
Fri Nov 21 10:55:18 CET 2014



Hi

I would suggest a different approach, more similar to Armin's idea of parallelization.

You could just optimistically assume that the loop is parallelizable. Just execute few steps at once (each in its own memory sandbox) and check for conflicts later. This also plays nice with STM.

So, the general solution would look like this:
1. Split the loop into individual step invocations.
2. Reserve a tiny memory block for each loop step.
3. Optional: compile the loop step into OpenCL.
4. Execute every loop step in parallel, saving the changes made by each invocation to its individual memory block.
5. Check if the changes are conflicting.
6. If not, merge them and commit to the global memory.
7. If they are, fall back to serial loop execution.

I think all the building blocks are here, in particular the recent GIL removal and Armin's STM research.

It sounds exciting.

Cheers
haael


Od: "黄若尘" <hrc706 at gmail.com>
Do: "Maciej Fijalkowski" <fijall at gmail.com>; 
Wysłane: 7:49 Piątek 2014-11-21
Temat: Re: [pypy-dev] An idea about automatic parallelization in	PyPy/RPython

> Yes, I actually knew this way to get traces.
> 
> Well, what I mean is that, I want to handle those traces in RUNTIME. I want to insert some code in RPython’s JIT to detect some traces which can be executed parallel, if so, then COMPILE them into OpenCL code (then into native code, and run), if not, compile them to normal native code as what RPython do now. So maybe it’s important to firstly prevent the normal compilation and analyze the trace.
> 
> I’m sorry for my bad English and bad expression.
> 
> > 在 2014年11月21日,下午3:36,Maciej Fijalkowski  写道:
> > 
> > You get traces by running PYPYLOG=jit-log-opt,jit-backend: pypy ....
> > 
> > There is a tool call jitviewer for viewing those traces. OpenCL is
> > likely just written in C and the kernel itself does not contain any
> > Python.
> > 
> > On Fri, Nov 21, 2014 at 3:17 AM, 黄若尘  wrote:
> >> Hi Fijaklowski,
> >> 
> >> Thank you very much for your reply.
> >> 
> >> Yes, you are right, it’s too hard for me to implement automatic parallelization for the whole PyPy’s trace JIT. I think maybe I can firstly do some work with a very simple interpreter (for example the example-interpreter introduced by PyPy documentation), and try to change some behaviors of RPython JIT.
> >> 
> >> By the way, could you tell me how can I get the traces and handle them before compiled to native code? I just want to try to convert some of the traces to OpenCL kernel codes and run them in other devices like GPU.
> >> 
> >> Best Regards,
> >> Huang Ruochen
> >> 
> >>> 在 2014年11月21日,上午12:05,Maciej Fijalkowski  写道:
> >>> 
> >>> Hi 黄若尘
> >>> 
> >>> This is generally a hard problem that projects like GCC or LLVM didn't
> >>> get very far. The problem is slightly more advanced with PyPys JIT,
> >>> but not much more.
> >>> 
> >>> However, the problem is you can do it for simple loops, but the
> >>> applications are limited outside of pure numerics (e.g. numpy) and
> >>> also doing SSE stuff in such cases first seems like both a good
> >>> starting point and a small enough project for master thesis.
> >>> 
> >>> Cheers,
> >>> fijal
> >>> 
> >>> On Tue, Nov 18, 2014 at 3:46 AM, 黄若尘  wrote:
> >>>> Hi everyone,
> >>>> 
> >>>>  I’m a master student in Japan and I want to do some research in PyPy/RPython.
> >>>>  I have read some papers about PyPy and I also had some ideas about it.  I have communicated with Mr. Bloz and been advised to send my question here.
> >>>> 
> >>>>  Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster.
> >>>>  I think it maybe suitable because:
> >>>>  1. The traced-base JIT is targeting on loops, which is straight to parallel computation.
> >>>>  2. There is no control-flow in trace, which is suitable to the fragment program in GPU.
> >>>>  3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel.
> >>>> 
> >>>>  What do you think about it?
> >>>> 
> >>>> Best Regards,
> >>>> Huang Ruochen
> >>>> _______________________________________________
> >>>> pypy-dev mailing list
> >>>> pypy-dev at python.org
> >>>> https://mail.python.org/mailman/listinfo/pypy-dev
> >> 
> 
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
> 




More information about the pypy-dev mailing list