[Numpy-discussion] custom accumlators

Fri Jan 5 17:01:26 EST 2007

On 05/01/07, Charles R Harris <charlesr.harris at gmail.com> wrote:

> On 1/5/07, Tim Hochberg <tim.hochberg at ieee.org> wrote:
> > Matt Knox wrote:
> > > Basically, I'd like to be able to do accumulate operations with custom
> functions. numpy.vectorize does not seem to provide an accumulate method
> with the functions it returns.

If I understand correctly, you want to be able to write a python
function that combines two scalars and returns another, then apply it
(via vectorize or some such) like a ufunc's accumulate, or like
python's reduce (for lists; for some reason the analogous function for
iterators doesn't seem to exist). Have you tried using reduce?

> > Note that if you are looking for speed, numpy.vectorize is probably not
> > what you are looking for even if it did work for this sort of stuff.

In fact, anything that goes through python code for the "combine two
scalars" will be slow. The slowness of looping in python is not
because python's looping constructs are slow, it's because executing
python code is slow. So vectorize is kind of a cheat - it doesn't
actually run fast, but it is convenient.

[text deleted]

>  I think what he needs is something like a linear prediction code or a IIP
> filter. The place to look would be in scipy, either in signal processing or
> statistics (ARMA). I don't know that it is there, but it might (should) be.

I don't think so. As he said in his original post, he's looking for a
general function to produce this from any function which combines two
scalars to produce another, whether it's been implemented in signal
processing or not.

It would be fairly straightforward to add this ability to the
functions returned from vectorize(); it's implemented in python, IIRC,
so one would just turn it into a class (SoftUfunc?) (with a __call__)
and add an accumulate() function. It won't avoid python looping, but
neither does vectorize, and in any case it can't be fast (even if the
provided function were implemented in C calling back and forth through
python would probably be a slowdown). And *convenience* is actually
one of the biggest advantages of ufuncs, fancy indexing, and all that
numpy whatnot.

That said, it might also be worth looking at whether numexpr can do
this sort of thing - it's supposed to take a numpy expression and
evaluate it efficiently in C (avoiding intermediate arrays and so on).

Finally, if you're careful, you can in fact abuse slicing and
augmented assignment to do this efficiently:

In [9]: a = ones(10)

In [10]: a[1:]+=a[:-1]

In [11]: a
Out[11]: array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

Unfortunately, this cannot really be adapted to anything but the
operators with augmented assignment and things with an output argument
- which includes ufuncs, but not, alas, vectorize.

Really it would be nice if what vectorize() returned were effectively
a ufunc, supporting all the various operations we might want from a
ufunc (albeit inefficiently). This should not be difficult, but I am
not up to writing it this evening.

A. M. Archibald