[Cython] gsoc: array expressions

Fri May 25 22:57:24 CEST 2012

I just resended this email as it was rejected by the mailing list. So
I subscribed to it.

Hi,

Sorry for the delay, I had some schedule change.

thanks for adding me. Should I subscribe to cython-dev? How much email
daily there is? I didn't found this on the archives. Fell free to add
me in CC again when you think it is appropriate.

I'll reply here to all email at the same time. Do you prefer that I
reply to each email individually if this happen again? I'll try to
reply faster next time.

- About pickling theano, we currently can't pick Theano function. It
could be made to work in some cases, but not for all cases as there is
hardware dependent optimization in the Theano function. Currently it
is mostly CPU vs GPU operation. So if we stay on the CPU, we could do
some pickling, but we should make sure that the compiled c code into
python module are still there when we unpickle or recompile them.

- I think it make sense to make a theano graph from cython ast,
optimize and redo a cython ast from the optimized graph. This would
allow using Theano optimizations.

- It also make sense to do the code generation in Theano and reuse it
in Cython. But this would make the Theano dependency much stronger.
I'm not sure you want this.

- Another point not raised, theano need to know at compile time is the
dtype, number of dimensions and witch dimensions are broadcastable for
each variable. I think that the last one could cause problem, but if
you use specialization for the dtype, the same can be done for the
broadcsatability of a dimensions.

- The compyte(gpu nd array) project do collapsing of dimensions. This
is an important optimization on the GPU as doing the index computation
in parallel is costlier. I think on the CPU we could probably do
collapsing just of the inner dimensions to make it faster.

- Theano don't generate intrinsect or assembly, but we suppose that
g++ will generate vectorized operation for simple loop. Recent version
of gcc/g++ do this.

- Our generated code for element-wise operation take care a little
about the memory access pattern. We swap dimensions to iterate on the
dimensions with the smallest strides. But we don't go further.

- What do you mean by CSE? Constant  optimization?

Fred