[Numpy-discussion] How to efficiently reduce a 1-dim 100-10000 element array with user defined binary function

Fri Nov 14 20:22:52 EST 2008

Dear numpy-discussion,

I am quite new to Python and numpy.

I am working on a Python application using Scipy, where I need to
unpack and pack quite large amounts of data (GBs) into data structures
and convert them into other data structures. Until now the program has
been running amazingly efficiently (processing 10-125 MB/s depending
on the task) because I have managed to use only built-in array
functions for all computationally intensive operations (and beause i
have fairly good hardware to run it on).

However, I have run into problems now, as I need to compute new
checksums for modified data structures using a ones complement add on
one-dimensional, uint16 arrayrs with 50-10000 elements.

The best procedure I have come up with yet is the following implementation

def complement_add_uint16(a, b):
    """
    Returns the complement ones addition of two unsigned 16 bit integers

    The values are added and if the carry is set, the value is incremented.

    It is assumed that a and b are both in the range [0; 0xFFFF].
    """
    c = a + b
    c += c >> 16
    return c & 0xFFFF

def complement_add_checksum(ints):
    """
    Return a complement checksum based on a
    one-dimensional array of dtype=uint16

    """
    return reduce(complement_add_uint16, ints, 0)

This works, but it is very slow as compared to the other operations I
have implemented in numpy.

I have profiled a typical use case of my application with profile.run(...)
The profiler output shows that 88% of the time in the application is spend on
the reduce(complement_add_uint16, ints, 0) statement and 95% of that
time is spend
solely on calls to the binary complement_add_unit16 function.

I have elaborate units tests for both functions, so I am not afraid of
experimenting, only my experience and numpy knowledge is lacking...

Is there a smart way to optimize this procedure, while staying in numpy?

I am working on a Windows box, and I do not have a C, C++ or Fortran
compiler installed nor am I particularly knowledgeable in programming
in these languages (I come from a Java world).

It is not that I am lazy, it is just that the thing has to run on a
Linux box as well, and I would like
to avoid too much with tinkering with platform specific compilers.

I have actually already posted a thread about this on comp.lang.python
at a time where I still had not felt the full impact of the poor
performance:
http://groups.google.dk/group/comp.lang.python/browse_thread/thread/2f0e7ee3ad76d5a3/e2eae3719c6e3fe8#e2eae3719c6e3fe8
where I got some advanced advice (among those to visit this forum). It
is a little bit over my head though (for the moment), and I was
wondering if there is a simpler way?

Best wishes,
Slaunger