[Numpy-discussion] C API questions and tricks.

Thu Aug 23 12:42:14 EDT 2001

Hi all,

I recently discovered the "clip()" function, and thought it was just
what I'd been wanting, because I need to do that a lot, and it provided
a nice notation, and I was hoping speed improvement over a couple of
where() statements.

I did some experiments, and discovered that it was, in fact, slower than
the where statements, and that the fastest way to do it is to use two
putmask() calls. (note: this changes the array in place, rather than
creating a new one)

The reason it is not faster is because it is written in Python, calling
choose(), similarly to how where() does. I decided that I could use a
fast version, so I set out to write one in C, resulting in the following
questions:

A) How do I loop through all the elements of a discontiguous array of
arbitraty dimensions? If I knew the rank ahead of time, I could just
nest some for loops and use the strides[] values. Not knowing before
hand, it seems that I should be able to do some nesting of loops using
nd and dimensions[], but I can't seem to work it out. Someone must have
come up with a nifty way to do this. Is there an existing macro or
function to do it?

B) How can I write a function that can act on any of the NumPy data
types? I currently have it written to only work with contiguous Float
arrays, which is what i need at the moment, but I'd much rather have one
for the general case.

C) I'd also like any feedback on other elements of my code (enclosed
with this email). A few comments on what the function (I've called it
fastclip) is supposed to do:

"""
fastclip(A,min,max)

changes the array, A, in place, so that all the elements less than min
are replaced by min, and all the elements greater that max are replaced
by max.

min and max can be either scalars, or anything that can be converted to
an array with the same number of elements as A (using
PyArray_ContiguousFromObject() ). If min and/or max is an array, than
the coresponding elements are used. This allows, among other things, a
way to clip to just a min or max value by calling it as:
fastclip(A,A,max) or fastclip(A,min,A).

"""

I wrote a little test script to benchmark the function, and it is much
faster that the alternatives that I have thought of:

#!/usr/bin/env python
# testing speed of where vs clip vs fastclip

from Numeric import *
from RandomArray import uniform
from NumericExtras import fastclip
import time

n = 5000

a = uniform(0,100,(n,))
b = uniform(0,100,(n,))
c = uniform(0,100,(n,))

min = 20.0
max = 80.0

print "n = %i"%(n,)

start = time.clock()
for i in range(100):
    a = clip(a,min,max)
print "clip took %f seconds"%(time.clock()-start)

start = time.clock()
for i in range(100):
    putmask(a,a < min,min)
    putmask(a,a > max,max)
print "putmask took %f seconds"%(time.clock()-start)

start = time.clock()
for i in range(100):
    fastclip(a,min,max)
print "fastclip took %f seconds"%(time.clock()-start)

Here are some results:

n = 50
clip took 0.020000 seconds
putmask took 0.050000 seconds
fastclip took 0.010000 seconds

n = 5000
clip took 0.300000 seconds
putmask took 0.230000 seconds
fastclip took 0.030000 seconds

As expected the large the array is, the better the improvement.

I'd love to here any feedbackyou can give me: I've new to writing Python
extensions, using the  Numeric API, and C itself, for that matter.

-thanks, Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------
-------------- next part --------------

#include "Python.h"
#include "arrayobject.h"

// This define is to accomidate the different ways shared libs are built (see the bottom of the file)

#define _LINUX // this shouild be one of:  _LINUX, _WIN32, or _MACOS

// These are little macros I use to access array values for specific rank arrays
#define ARRAYVAL0(aType,a) ( *(aType *)(a->data))	
#define ARRAYVAL1(aType,a,i) ( *(aType *)(a->data + (i)*a->strides[0]))	
#define ARRAYVAL2(aType,a,i,j) ( *(aType *)(a->data + (i)*a->strides[0] + (j)*a->strides[1]))	
#define ARRAYVAL3(aType,a,i,j,k) ( *(aType *)(a->data + (i)*a->strides[0] + (j)*a->strides[1] + (k)*a->strides[2]))	

// A function that computes the total number of elements in and array 
// Does this exist already?
int TotalNumberOfElements(PyArrayObject *Array)
{
  int total = 1;
  short i;

  for (i = 0; i < Array->nd; i++)
    {
      total *= Array->dimensions[i];
    }
  return total;
}

// This is the fastclip code:
// called from Python as: fastclip(Array,min,max)
// min, max can be scalar or the same size as Array.
// At the moment, Array has to be contiguous, and it only works with arrays of type Float.
static PyObject * NumericExtras_fastclip(PyObject *self, PyObject *args)
{

  int NumArray, elementsize, i;

  PyObject *InMin;
  PyObject *InMax;

  PyArrayObject *Array;
  PyArrayObject *Min;
  PyArrayObject *Max;

  //printf("I'm starting\n");

  if (!PyArg_ParseTuple(args, "O!OO", 
						&PyArray_Type, &Array,
					    &InMin,
						&InMax))
  {
    return NULL;
  }  

  // check types of input

  // check type of Array
  if (!PyArray_ISCONTIGUOUS(Array)){
	PyErr_SetString (PyExc_ValueError,
					 "a must be contiguous");
	return NULL;
  }

  if (Array->descr->type_num != PyArray_DOUBLE){
	PyErr_SetString (PyExc_ValueError,
					 "a must be a NumPy array of type Float");
	return NULL;
  }

  // convert min and max to arrays:
  // if they don't convert, the input values are not valid
  Min = (PyArrayObject *) PyArray_ContiguousFromObject(InMin, PyArray_DOUBLE, 0, 0);
  if (Min == NULL){
    PyErr_SetString (PyExc_ValueError,
					 "min must be either a scalar or the same size as a");
    return NULL;
  }
  Max = (PyArrayObject *) PyArray_ContiguousFromObject(InMax, PyArray_DOUBLE, 0, 0);
  if (Max == NULL){
    PyErr_SetString (PyExc_ValueError,
					 "max must be either a scalar or the same size as a");
    Py_DECREF(Min);
    return NULL;
  }

  // This seems pretty kludgy to have the four cases, but it was the first
  // thing that came to mind that I was sure would work, and would accomidate
  // either array or scalar arguments for min and max. 
  NumArray = TotalNumberOfElements(Array);
  elementsize = sizeof(double);

  if (TotalNumberOfElements(Min) == 1 && TotalNumberOfElements(Max) == 1){ // both limits are scalar
    for (i = 0; i < NumArray; i++) { // loop over each element
      if (*(double *)(Array->data + i*elementsize) > ARRAYVAL0(double,Max)){
         *(double *)(Array->data + i*elementsize) = ARRAYVAL0(double,Max);
          }
      else if (*(double *)(Array->data + i*elementsize) < ARRAYVAL0(double,Min)){
         *(double *)(Array->data + i*elementsize) = ARRAYVAL0(double,Min);
      }
    }
  }
  else if (TotalNumberOfElements(Min) == 1 && TotalNumberOfElements(Max) == NumArray){ // Min is scalar
    for (i = 0; i < NumArray; i++) { // loop over each element
      if (*(double *)(Array->data + i*elementsize) > *(double *)(Max->data + i*elementsize)){
         *(double *)(Array->data + i*elementsize) = *(double *)(Max->data + i*elementsize) ;
          }
      else if (*(double *)(Array->data + i*elementsize) < ARRAYVAL0(double,Min)){
         *(double *)(Array->data + i*elementsize) = ARRAYVAL0(double,Min);
      }
    }
  }    
  else if (TotalNumberOfElements(Max) == 1 && TotalNumberOfElements(Min) == NumArray){ // Max is scalar
    for (i = 0; i < NumArray; i++) { // loop over each element
      if (*(double *)(Array->data + i*elementsize) > ARRAYVAL0(double,Max)){
         *(double *)(Array->data + i*elementsize) = ARRAYVAL0(double,Max);
          }
      else if (*(double *)(Array->data + i*elementsize) < *(double *)(Min->data + i*elementsize)){
         *(double *)(Array->data + i*elementsize) = *(double *)(Min->data + i*elementsize) ;
      }
    }
  }    
  else if (TotalNumberOfElements(Min) == NumArray && TotalNumberOfElements(Max) == NumArray){ // Neither is scalar
    for (i = 0; i < NumArray; i++) { // loop over each element
      if (*(double *)(Array->data + i*elementsize) > *(double *)(Max->data + i*elementsize)){
         *(double *)(Array->data + i*elementsize) = *(double *)(Max->data + i*elementsize) ;
          }
      else if (*(double *)(Array->data + i*elementsize) < *(double *)(Min->data + i*elementsize)){
         *(double *)(Array->data + i*elementsize) = *(double *)(Min->data + i*elementsize) ;
      }
    }    
  }
  else { // The size of Min and/or Max don't match Array
    	PyErr_SetString (PyExc_ValueError,
					 "min and max must be either scalar or the same number of elements as a");
        Py_DECREF(Min);
        Py_DECREF(Max);
        return NULL;
  }

  return Py_None;
}

static PyMethodDef NumericExtrasMethods[] = {
  {"fastclip", NumericExtras_fastclip, METH_VARARGS},
  {NULL, NULL} /* Sentinel */
};

#if defined _WIN32
  void _declspec(dllexport)
#elif defined _LINUX
  void 
#elif defined _MACOS
  void
#else
  #error There is no valid platform defined (should be in platfrom.h and be one of: _WIN32,_LINUX,_MACOS)
#endif

initNumericExtras()

{
  (void) Py_InitModule("NumericExtras", NumericExtrasMethods);
  import_array()
}