[Numpy-discussion] numarray-1.0 Bug Alert

Todd Miller jmiller at stsci.edu
Tue Jul 13 10:42:04 EDT 2004


Overview

There is a bug in numarray's Numeric compatible C-API.  The bug has been
latent for a long time, since numarray-0.3 was released roughly two
years ago.  It is serious because it results in wrong answers for a
certain extension functions fed a certain class of arrays.

What's affected

The bug affects affects numarray's add-on packages or third party
extension functions which use the Numeric compatibility C-API. 
Generally, this means C-code that was either ported from Numeric or was
written with both Numeric and numarray in mind.  This includes the
add-on packages numarray.linear_algebra,  numarray.fft,
numarray.random_array, and numarray.mlab.  More recently, it includes
the ports of core Numeric functions to numarray.numeric.  Because
numarray.ma uses numarray.numeric,  the bug also affects numarray.ma. 
Finally, for numarray-1.0 this bug affects the functions numarray.argmin
and numarray.argmax; these should be the only two functions in core
numarray which are affected.

Detailed Bug Description

The bug is exposed by calling an extension function (written using the
Numeric compatible C-API) with an array that has a non-zero _byteoffset
attribute.  Arrays with non-zero _byteoffset are typically created as a
result of partially indexing higher dimensional arrays or slicing
arrays.  Partially indexing or slicing an array generally results in a
sub-array, a view which often refers to an interior region of the
original array buffer.  Because numarray's PyArrayObject does not
currently include it's ->byteoffset in its ->data pointer as the Numeric
compatibility API assumes it does, an extension function sees the base
region of the original array rather than the region belonging to the
sub-array.

Immediate User Workaround

A simple user level workaround for people that need to use the affected
packages and functions today is one like the following:

def make_safe_for_numeric_api(a):
	a = numarray.asarray(a)
	if a._byteoffset != 0:
		return a.copy()
	else:
		return a

The array inputs to an affected extension function need to be wrapped
with calls to make_safe_for_numeric_api().  Since this is intrusive and
a real fix should be released in the near future, this approach is not
recommended.

Long Term Fix

The real fix for the bug appears to be to redefine the semantics of
numarray's PyArrayObject ->data pointer to include ->byteoffset,
altering the C-API.  This should make most existing Numeric compatible
extension functions work without modification or recompilation,  but
will necessitate the re-compilation of some extension functions written
using the native numarray API approaches (the NA_* functions and
macros).   This recompilation will be required because key macros will
change, most notably NA_OFFSETDATA. This fix is not the only possible
one, and other suggestions are welcome,  but changing the semantics of
->data appears to be the best way to facilitate numarray/Numeric
interoperability.  By doing this fix, numarray operates more like
Numeric so fewer changes need to be made in the future to perform ports
of Numeric code to numarray.

Impact of Proposed Fix

Regrettably, the proposed fix will break binary compatibility for
clients of the numarray-1.0 native C-API.  So, extensions built using
the numarray native C-API will need to be rebuilt for numarray-1.1. 
Extensions that have made direct access to PyArrayObject's ->data and
require the original offsetless meaning will also need to change code
for numarray-1.1.  This is something we *really* wanted to avoid... it
just isn't going to happen this time.  

The Plan

The current plan is to fix the Numeric compatible API by changing the
semantics of ->data and release numarray-1.1 relatively soon, hopefully
within 2 weeks.   I'm sorry for any inconvenience this has caused
numarray users.

Regards,
Todd Miller





More information about the NumPy-Discussion mailing list