[Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

Wed Mar 21 09:14:39 EDT 2012

On 20 March 2012 20:49, Olivier Delalleau <shish at keba.be> wrote:
> I doubt Theano is already as smart as you'd want it to be right now, however
> the core mechanisms are there to perform graph optimizations and move
> computations to GPU. It may save time to start from there instead of
> starting all over from scratch. I'm not sure though, but it looks like it
> would be worth considering it at least.

Thanks for the suggestion Olivier, as Dag said we discusses it, and
indeed we (or I) should look a lot deeper into it and see what
components are reusable there and discuss with the Theano community if
and how we can collaborate.

> -=- Olivier
>
> Le 20 mars 2012 15:40, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> a
> écrit :
>
>> We talked some about Theano. There are some differences in project goals
>> which means that it makes sense to make this a seperate project: Cython
>> wants to use this to generate C code up front from the Cython AST at
>> compilation time; numba also has a different frontend (parsing of python
>> bytecode) and a different backend (LLVM).
>>
>> However, it may very well be possible that Theano could be refactored so
>> that the more essential algorithms working on the syntax tree could be
>> pulled out and shared with cython and numba. Then the question is whether
>> the core of Theano is smart enough to compete with Fortran compilers and
>> support arbitraily strided inputs optimally. Otherwise one might as well
>> start from scratch. I'll leave that for Mark to figure out...
>>
>> Dag
>> --
>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>
>>
>> Olivier Delalleau <shish at keba.be> wrote:
>>>
>>> This sounds a lot like Theano, did you look into it?
>>>
>>> -=- Olivier
>>>
>>> Le 20 mars 2012 13:49, mark florisson <markflorisson88 at gmail.com> a écrit
>>> :
>>>>
>>>> On 13 March 2012 18:18, Travis Oliphant <travis at continuum.io> wrote:
>>>> >>>
>>>> >>> (Mark F., how does the above match how you feel about this?)
>>>> >>
>>>> >> I would like collaboration, but from a technical perspective I think
>>>> >> this would be much more involved than just dumping the AST to an IR
>>>> >> and generating some code from there. For vector expressions I think
>>>> >> sharing code would be more feasible than arbitrary (parallel) loops,
>>>> >> etc. Cython as a compiler can make many decisions that a Python
>>>> >> (bytecode) compiler can't make (at least without annotations and a
>>>> >> well-defined subset of the language (not so much the syntax as the
>>>> >> semantics)). I think in numba, if parallelism is to be supported, you
>>>> >> will want a prange-like construct, as proving independence between
>>>> >> iterations can be very hard to near impossible for a compiler.
>>>> >
>>>> > I completely agree that you have to define some kind of syntax to get
>>>> > parallelism.  But, a prange construct would not be out of the question, of
>>>> > course.
>>>> >
>>>> >>
>>>> >> As for code generation, I'm not sure how llvm would do things like
>>>> >> slicing arrays, reshaping, resizing etc (for vector expressions you
>>>> >> can first evaluate all slicing and indexing operations and then
>>>> >> compile the remaining vector expression), but for loops and array
>>>> >> reassignment within loops this would have to invoke the actual
>>>> >> slicing
>>>> >> code from the llvm code (I presume).
>>>> >
>>>> > There could be some analysis on the byte-code, prior to emitting the
>>>> > llvm code in order to handle lots of things.   Basically, you have to "play"
>>>> > the byte-code on a simple machine anyway in order to emit the correct code.
>>>> >   The big thing about Cython is you have to typedef too many things that are
>>>> > really quite knowable from the code.   If Cython could improve it's type
>>>> > inference, then it would be a more suitable target.
>>>> >
>>>> >> There are many other things, like
>>>> >> bounds checking, wraparound, etc, that are all supported in both
>>>> >> numpy
>>>> >> and Cython, but going through an llvm layer would as far as I can
>>>> >> see,
>>>> >> require re-implementing those, at least if you want top-notch
>>>> >> performance. Personally, I think for non-trivial performance-critical
>>>> >> code (for loops with indexing, slicing, function calls, etc) Cython
>>>> >> is
>>>> >> a better target.
>>>> >
>>>> > With libclang it is really quite possible to imagine a cython -> C
>>>> > target that itself compiles to llvm so that you can do everything at that
>>>> > intermediate layer.   However,  LLVM is a much better layer for optimization
>>>> > than C now that there are a lot of people collaborating on that layer.   I
>>>> > think it would be great if Cython targeted LLVM actually instead of C.
>>>> >
>>>> >>
>>>> >> Finally, as for non-vector-expression code, I really believe Cython
>>>> >> is
>>>> >> a better target. cython.inline can have high overhead (at least the
>>>> >> first time it has to compile), but with better (numpy-aware) type
>>>> >> inference or profile guided optimizations (see recent threads on the
>>>> >> cython-dev mailing list), in addition to things like prange, I
>>>> >> personally believe Cython targets most of the use cases where numba
>>>> >> would be able to generate performing code.
>>>> >
>>>> > Cython and Numba certainly overlap.  However, Cython requires:
>>>> >
>>>> >        1) learning another language
>>>> >        2) creating an extension module --- loading bit-code files and
>>>> > dynamically executing (even on a different machine from the one that
>>>> > initially created them) can be a powerful alternative for run-time
>>>> > compilation and distribution of code.
>>>> >
>>>> > These aren't show-stoppers obviously.   But, I think some users would
>>>> > prefer an even simpler approach to getting fast-code than Cython (which
>>>> > currently doesn't do enought type-inference and requires building a dlopen
>>>> > extension module).
>>>>
>>>> Dag and I have been discussing this at PyCon, and here is my take on
>>>> it (at this moment :).
>>>>
>>>> Definitely, if you can avoid Cython then that is easier and more
>>>> desirable in many ways. So perhaps we can create a third project
>>>> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
>>>> abstract syntax tree in a rather simple form, performs code
>>>> optimizations such as rewriting loops with array accesses to vector
>>>> expressions, fusing vector expressions and loops, etc, and spits out a
>>>> transformed AST containing these optimizations. If runtime information
>>>> is given such as actual shape and stride information the
>>>> transformations could figure out there and then whether to do things
>>>> like collapsing, axes swapping, blocking (as in, introducing more axes
>>>> or loops to retain discontiguous blocks in the cache), blocked memory
>>>> copies to contiguous chunks, etc. The AST could then also say whether
>>>> the final expressions are vectorizable. Part of this functionality is
>>>> already in numpy's nditer, except that this would be implicit and do
>>>> more (and hopefully with minimal overhead).
>>>>
>>>> So numba, Cython and maybe numexpr could use the functionality, simply
>>>> by building the AST from Python and converting back (if necessary) to
>>>> its own AST. As such, the AST optimizer would be only part of any
>>>> (runtime) compiler's pipeline, and it should be very flexible to
>>>> retain any information (metadata regarding actual types, control flow
>>>> information, etc) provided by the original AST. It would not do
>>>> control flow analysis, type inference or promotion, etc, but only deal
>>>> with abstract types like integers, reals and arrays (C, Fortran or
>>>> partly contiguous or strided). It would not deal with objects, but
>>>> would allow to insert nodes like UnreorderableNode and SideEffectNode
>>>> wrapping parts of the original AST. In short, it should be as easy as
>>>> possible to convert from an original AST to this project's AST and
>>>> back again afterwards.
>>>>
>>>> As the project matures many optimizations may be added that deal with
>>>> all sorts of loop restructuring and ways to efficiently utilize the
>>>> cache as well as enable vectorization and possibly parallelism.
>>>> Perhaps it could even generate a different AST depending on whether
>>>> execution target the CPU or the GPU (with optionally available
>>>> information such as cache sizes, GPU shared/local memory sizes, etc).
>>>>
>>>> Seeing that this would be a part of my master dissertation, my
>>>> supervisor would require me to write the code, so at least until
>>>> August I think I would have to write (at least the bulk of) this.
>>>> Otherwise I can also make other parts of my dissertation's project
>>>> more prominent to make up for it. Anyway, my question is, is there
>>>> interest from at least the numba and numexpr projects (if code can be
>>>> transformed into vector operations, it makes sense to use numexpr for
>>>> that, I'm not sure what numba's interest is in that).
>>>>
>>>> > -Travis
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >>
>>>> >>> Dag
>>>> >>> _______________________________________________
>>>> >>> NumPy-Discussion mailing list
>>>> >>> NumPy-Discussion at scipy.org
>>>> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> NumPy-Discussion mailing list
>>>> >>> NumPy-Discussion at scipy.org
>>>> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>> >>>
>>>> >> _______________________________________________
>>>> >> NumPy-Discussion mailing list
>>>> >> NumPy-Discussion at scipy.org
>>>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>> >
>>>> > _______________________________________________
>>>> > NumPy-Discussion mailing list
>>>> > NumPy-Discussion at scipy.org
>>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>