[Numpy-discussion] low level optimization in NumPy and minivect

Mon Jun 24 11:46:04 EDT 2013

On Wed, Jun 19, 2013 at 7:48 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett <matthew.brett at gmail.com>wrote:
>
>> Hi,
>>
>> On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien <nouiz at nouiz.org>
>> wrote:
>> > Hi,
>> >
>> >
>> > On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
>> > <jtaylor.debian at googlemail.com> wrote:
>> >>
>> >> On 17.06.2013 17:11, Frédéric Bastien wrote:
>> >> > Hi,
>> >> >
>> >> > I saw that recently Julian Taylor is doing many low level
>> optimization
>> >> > like using SSE instruction. I think it is great.
>> >> >
>> >> > Last year, Mark Florisson released the minivect[1] project that he
>> >> > worked on during is master thesis. minivect is a compiler for
>> >> > element-wise expression that do some of the same low level
>> optimization
>> >> > that Julian is doing in NumPy right now.
>> >> >
>> >> > Mark did minivect in a way that allow it to be reused by other
>> project.
>> >> > It is used now by Cython and Numba I think. I had plan to reuse it in
>> >> > Theano, but I didn't got the time to integrate it up to now.
>> >> >
>> >> > What about reusing it in NumPy? I think that some of Julian
>> optimization
>> >> > aren't in minivect (I didn't check to confirm). But from I heard,
>> >> > minivect don't implement reduction and there is a pull request to
>> >> > optimize this in NumPy.
>> >>
>> >> Hi,
>> >> what I vectorized is just the really easy cases of unit stride
>> >> continuous operations, so the min/max reductions which is now in numpy
>> >> is in essence pretty trivial.
>> >> minivect goes much further in optimizing general strided access and
>> >> broadcasting via loop optimizations (it seems to have a lot of overlap
>> >> with the graphite loop optimizer available in GCC [0]) so my code is
>> >> probably not of very much use to minivect.
>> >>
>> >> The most interesting part in minivect for numpy is probably the
>> >> optimization of broadcasting loops which seem to be pretty inefficient
>> >> in numpy [0].
>> >>
>> >> Concerning the rest I'm not sure how much of a bottleneck general
>> >> strided operations really are in common numpy using code.
>> >>
>> >>
>> >> I guess a similar discussion about adding an expression compiler to
>> >> numpy has already happened when numexpr was released?
>> >> If yes what was the outcome of that?
>> >
>> >
>> > I don't recall a discussion when numexpr was done as this is before I
>> read
>> > this list. numexpr do optimization that can't be done by NumPy: fusing
>> > element-wise operation in one call. So I don't see how it could be done
>> to
>> > reuse it in NumPy.
>> >
>> > You call your optimization trivial, but I don't. In the git log of
>> NumPy,
>> > the first commit is in 2001. It is the first time someone do this in 12
>> > years! Also, this give 1.5-8x speed up (from memory from your PR
>> > description). This is not negligible. But how much time did you spend on
>> > them? Also, some of them are processor dependent, how many people in
>> this
>> > list already have done this? I suppose not many.
>> >
>> > Yes, your optimization don't cover all cases that minivect do. I see 2
>> level
>> > of optimization. 1) The inner loop/contiguous cases, 2) the strided,
>> > broadcasted level. We don't need all optimization being done for them
>> to be
>> > useful. Any of them are useful.
>> >
>> > So what I think is that we could reuse/share that work. NumPy have c
>> code
>> > generator. They could call minivect code generator for some of them when
>> > compiling NumPy. This will make optimization done to those code
>> generator
>> > reused by more people. For example, when new processor are launched, we
>> will
>> > need only 1 place to change for many projects. Or for example, it the
>> call
>> > to MKL vector library is done there, more people will benefit from it.
>> Right
>> > now, only numexpr do it.
>> >
>> > About the level 2 optimization (strides, broadcast), I never read NumPy
>> code
>> > that deal with that. Do someone that know it have an idea if it would be
>> > possible to reuse minivect for this?
>>
>> Would someone be able to guide some of the numpy C experts into a room
>> to do some thinking / writing on this at the scipy conference?
>>
>> I completely agree that these kind of optimizations and code sharing
>> seem likely to be very important for the future.
>>
>> I'm not at the conference, but if there's anything I can do to help,
>> please someone let me know.
>>
>
> Concerning the future development of numpy, I'd also suggest that we look
> at libdynd <https://github.com/ContinuumIO/libdynd>. It looks to me like
> it is reaching a level of maturity where it is worth trying to plan out a
> long term path to merger.
>

I'm in Austin for SciPy, and will giving a talk on the dynd library on
Thursday, please drop by if you can make it, I'm very interested in
cross-pollination of ideas between numpy, libdynd, blaze, and other array
programming projects. The Python exposure of dynd as it is now can
transform data to/from numpy via views very easily, where the data is
compatible, and I expect libdynd and numpy to live alongside each other for
quite some time. One possible way things could work is to think of libdynd
as a more rapidly changing "playground" for functionality that would be
nice to have in numpy, without the guarantees of C-level ABI or API
backwards compatibility that numpy has, at least before libdynd 1.0.

Cheers,
Mark

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130624/46eec183/attachment.html>