[Matrix-SIG] Re: Numeric plans
Paul F. Dubois
dubois1@llnl.gov
Tue, 21 Sep 1999 16:07:38 -0700
This is a multi-part message in MIME format.
------=_NextPart_000_000B_01BF044B.6D03FF20
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Perry et. al asked me for comments on the following:
> 1) "[Numeric] C code is a mess." (David Asher)
I invite you to peruse the source. Imagine you want to make the change
requested in #2.
I believe you will then hold this truth to be self-evident.
>
> 2) Paul Dubois mentioned the problems he was having settling on a
> consensus for how the "inverse of take" (the old scatter/gather issue)
> should be implemented when the array being indexed has rank > 1.
...
> Paul,
> is it possible to summarize this?
Paul has a day job. That is why he asked for proposals. If your proposal is
to implement x[indices] as both an r- and l-value for x one-dimensional
only, then by golly I do know what to do. But I don't necessarily know how
to do it, see #1. Egads, a circular reference, I will never be garbage
collected.
>
> 3) Are there aspects of this work (UserArray, general code clean-ups,
> or inverse of take) that others can help contribute to? If not...
People have been contributing when the spirit moved them by sending in
patches. David or I review the patches and as long as it doesn't offend our
sensibilities we put them in. We'd like to get #12 fixed after our screwup
and then would welcome things such as #2 being done for us. We both have a
lot on our plates so we are in the annoying position of locking out the
volunteer help for the moment because we are too busy.
>
> 4) A means of inheriting Numeric arrays is important to us also and we
> are very interested in how that issue is going to be solved. Is there
> any way for those of us that are interested to keep abreast of the ongoing
> design and implementation discussions?
David and I finally got a chance to talk together in person for a couple of
hours and we had some ideas about how to proceed. That is the sum total of
the "design". The general idea is to make a decent UserArray; this will
require adding some things to the core. We got a lot of experience during
the recent wild goose chase. I've been working on a facility for what I call
Masked Arrays; that is, arrays that have an optional mask that indicates
missing values. This *is* part of my day job so it is getting into pretty
good shape now. I was going to back-fit UserArray from it. I attach the
current version for your comments. This incarnation tries to let Numeric
arrays and these MA arrays coexist.
>
> Perry Greenfield
> Rick White
> Paul Barrett
>
Supposing there were a full-featured UserArray module, similar to MA but
without the mask, would it be helpful to you? Or perhaps, would it be
helpful as a reuse-by-editor form of inheritance? I guess what I think I
found out is that it is almost not really possible to inherit from Numeric
without fairly massive redefinitions. There are some tricks one might be
able to play to make a few extensions possible, such as using self.__class__
as a constructor where possible, but maybe even that isn't a good idea since
it assumes you can make a valid Foo using just data as an argument, clearly
not true in general.
For the longer term, it would be nice to get a do-over. However, there are
many different communities here with different needs and few social skills
with which to resolve the differences. (:->. This fact is easiest to see if
you think about x[i, :]. The speed burners want this to be a reference, the
usabilities want this to be a copy.
Best regards,
Paul
------=_NextPart_000_000B_01BF044B.6D03FF20
Content-Type: text/plain;
name="MA.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="MA.py"
"""MA facility Copyright 1999 Regents of the University of California.
Released for unlimited redistribution; see file Legal.htm in Numerical =
distribution."""
__version__ =3D 2.0
#This version gives up on the Extension Class stuff
import Numeric
from Precision import *
import string, types
class MAError (Exception):
def __init__ (self, args=3DNone):
self.args =3D args
def __str__(self):
return str(self.args)
__MaxElements =3D 300 #Maximum size for printing
def setMaxElements (m):
"Set the maximum # of elements for printing arrays. "
global __MaxElements
n =3D int(m)
if n <=3D 0: raise ValueError, "setMaxElements requires positive =
argument."
__MaxElements =3D n
def getMaxElements ():
"Get the maximum # of elements for printing arrays. "
global __MaxElements
return __MaxElements
def getmask (a):
"Mask of values in a, if any; None otherwise."
if isMA(a):=20
return a.mask()
else:=20
return None
def getmaskarray (a):
"Mask of values in a, ones if none."
m =3D getmask(a)
if m is None:
return Numeric.ones(shape(a))
else:
return m
def mask_or (m1, m2):
"Logical or of the masks m1 and m2, treating None as false."
if m1 is None: return m2
if m2 is None: return m1
return Numeric.logical_or(m1, m2)
def filled (a, value =3D None):
"""
a as a Numeric array with any masked areas replaced by value"
No data copied if no masked area and data is already an array.
"""
if isMA(a):
return a.filled(value)
else:
return Numeric.array(a, copy=3D0)
class masked_unary_operation:
def __init__ (self, aufunc, fill=3D0):
self.f =3D aufunc
self.fill =3D fill
def __call__ (self, a):
m =3D getmask(a)
d1 =3D filled(a)
if m is None:
return self.f(d1)
else:
return MA(self.f(d1), m)
class masked_binary_operation:
def __init__ (self, aufunc, fill=3D0):
self.f =3D aufunc
self.fill =3D fill
def __call__ (self, a, b):
d1 =3D filled(a, self.fill)
d2 =3D filled(b, self.fill)
m =3D mask_or(getmask(a), getmask(b))
if m is None:
return self.f(d1, d2)
else:
return MA(self.f(d1,d2), m)
def reduce (self, target, axis=3D0):
m =3D getmask(target)
if m is None:
return self.f.reduce (target, axis)
else:
t =3D target.filled(self.fill)
tr =3D self.f.reduce(t, axis)
m =3D Numeric.logical_and.reduce(m, axis)
return MA(tr, m)
class comparison:
"A comparison function that returns a plain mask after doing the =
comparison"
def __init__ (self, f, fill1, fill2):
self.f =3D f
self.fill1 =3D fill1
self.fill2 =3D fill2
def __call__ (self, a, b):
return self.f(filled(a, self.fill1), filled(b, self.fill2))
# class MA =20
# the use of filled in the constructor
# makes sure we don't get MA's in data or mask=20
# This prevents recursion
# and we can assume these are array objects (or None for mask).
# Note current implementation assumes mask is 1's and 0's
# so that we can use Numeric.choose in filled.
class MA:
"Arrays with possibly masked values."
default_real_fill_value =3D 0.0
default_complex_fill_value =3D 0.0 + 0.0j
default_character_fill_value =3D ' '
default_integer_fill_value =3D 0
default_unsigned_integer_fill_value =3D 0
default_object_fill_value =3D None
def __init__(self, data, mask=3DNone, **keys):
"""MA(data, mask=3DNone)"""
self.__data =3D filled(data)
if mask is None:
self.__mask =3D None
else:
self.__mask =3D filled(mask, 1)
def __getattr__ (self, name):
if name =3D=3D 'shape':
return self.get_shape()
else:
return self.__dict__[name]
def __setattr__ (self, name, value):
if name =3D=3D 'shape':
self.set_shape(value)
else:
self.__dict__[name] =3D value
def __str__(self):
return str(self.filled())
def __repr__(self):
if self.__mask is None:
return "MA(" + repr(filled(self)) + ")"
else:
return "MA(" + repr(filled(self)) + ", \n " + \
repr(self.__mask) + ")"
def __float__(self):
return MA (float(self.filled(0.0)), self.__mask)
def __int__(self):
return MA(int(self.filled(0)), self.__mask)
# Note copy semantics here differ from Numeric =20
def __getitem__(self, i):=20
m =3D self.__mask
if m is None:
return Numeric.array(self.__data[i])
else:
return MA(Numeric.array(self.__data[i]), =
Numeric.array(m[i]))
def __getslice__(self, i, j):=20
m =3D self.__mask
if m is None:
return Numeric.array(self.__data[i:j])
else:
return MA(Numeric.array(self.__data[i:j]), =
Numeric.array(m[i:j]))
# --------
def __setitem__(self, index, value):
self.__data[index] =3D filled(value)
self.__mask =3D mask_or(self.__mask, getmask(value))
=20
def __setslice__(self, i, j, value):
self.__data[i:j] =3D filled(value)
m =3D getmask(value)
if m is not None:
sm =3D getmaskarray(self)
sm[i:j] =3D mask_or(sm[i:j], m)
self.__mask =3D sm
def __len__ (self):
"Return total number of elements in array."
return self.size()
def __cmp__ (self, other):
raise MAError, "Cannot use comparison operators on MA =
instances."
=20
def __abs__(self):=20
return masked_unary_operation(Numeric.absolute)(self)
def __neg__(self):=20
return masked_unary_operation(Numeric.negative)(self)
def __add__(self, other):=20
return masked_binary_operation(Numeric.add)(self, other)
=20
__radd__ =3D __add__
def __sub__(self, other):=20
return masked_binary_operation(Numeric.subtract,0)(self, other)
def __rsub__(self, other):=20
return masked_binary_operation(Numeric.subtract,0)(other, self)
def __mul__(self, other):=20
return masked_binary_operation(Numeric.multiply,0)(self, other)
=20
__rmul__ =3D __mul__
def __div__(self, other):=20
return divide(self, other)
def __rdiv__(self, other):=20
return divide(other, self)
def __pow__(self,other, third=3DNone):=20
if third is not None:=20
raise MAException, "3 arg power unspoorted"
return masked_binary_operation(Numeric.power, 1)(self, other)
def __sqrt__(self):=20
return masked_unary_operation(Numeric.sqrt,0)(self)
def default_fill_value (self):
"Function to calculate default fill value for an array."
x =3D self.typecode()
if x in typecodes['Float']:
return MA.default_real_fill_value
if x in typecodes['Integer']:
return MA.default_integer_fill_value
if x in typecodes['Complex']:
return MA.default_complex_fill_value
if x in typecodes['Character']:
return MA.default_character_fill_value
if x in typecodes['UnsignedInteger']:
return Numeric.absolute(MA.default_integer_fill_value)
else:
return MA.default_object_fill_value
def set_fill_value (self, v):
"Set the fill value to v."
self.fill_value =3D v
def filled (self, value=3DNone):
""" Self with masked data replaced by value.
If value is None, self.fill_value used.
self.default_fill_value () used if value is None.
Not a copy if no masked data.
Result is romised to be a Numeric.array.
"""
if self.__mask is None:
return self.__data
else:
if value is None:
if hasattr(self, 'fill_value'):
v =3D self.fill_value
else:
v =3D self.default_fill_value ()
self.fill_value =3D v
else:
v =3D value
return Numeric.choose(self.__mask, (self.__data, v))
def get_shape(self):
"Return the tuple giving the current shape."
return self.__data.shape
def ids (self):
"""Return the ids of the data and mask areas"""
return (id(self.__data), id(self.__mask))
def mask(self):
"Return the data mask, or None."
return self.__mask
=20
def set_shape (self, *sizes):
"""shape(n, m, ...) sets the shape."""
self.__data.shape =3D sizes
if self.__mask is not None:
self.__mask.shape =3D sizes
=20
def size (self, axis =3D None):
"Number of elements in array, or in a particular axis."
s =3D self.shape
if axis is None:
return reduce(Numeric.multiply, s, 1)
else:
return s[axis]
=20
def typecode(self):
return self.__data.typecode()
# these are disgusting
def is_contiguous (self):
return self.data.is_contiguous()
def byte_swapped(self):
return self.data.byte_swapped()
=20
def isMA (x):
"Is x an instance of MA?"
return isinstance(x, MA)
def masked_array (data, masked_value, typecode=3DNone, copy=3D1, =
atol=3D1.e-8, rtol=3D1.e-8, notest=3D0):
"""Create a masked array for numerical data; no mask unless =
necessary
or if notest !=3D 0
"""
abs =3D Numeric.absolute
d =3D Numeric.array(data, typecode=3Dtypecode, copy=3Dcopy)
dm =3D Numeric.less(abs(d-masked_value), atol+rtol*abs(d))
if notest or Numeric.sometrue(Numeric.ravel(dm)):
result =3D MA(d, dm)
else:
result =3D MA(d)
result.set_fill_value(masked_value)
return result
=20
def masked_object (data, masked_value, copy=3D1):
"Create a masked array of places where exactly equal to masked =
value"
dm =3D Numeric.equal(data, masked_value, copy=3Dcopy)
return MA(data, dm)
=20
def rank (a):
"Get the rank of a"
return len(shape(a))
=20
def shape (a):
"Get the shape of a"
if isMA (a):
return a.get_shape()
else:
return Numeric.shape(a)
def size (a, axis=3DNone):
"Get the number of elements in a, or along a certain axis."
if isMA (a):
return a.size(axis)
else:
s =3D Numeric.shape(a)
if axis is None:
return reduce(Numeric.multiply, s, 1)
else:
return s[axis]
def count (a, axis =3D None):
"Count of the non-masked elements." =20
m =3D getmask(a)
if m is None:
return size(a, axis)
else:
n1 =3D size(a, axis)
n2 =3D sum (m, axis)
return n1 - n2
def sum (a, axis =3D None):
"Sum of all elements, or along a certain axis"
if axis is None:
return Numeric.add.reduce(Numeric.ravel(filled(a, 0)))
else:
return Numeric.add.reduce(filled(a, 0), axis)
=20
def product (a, axis =3D None):
"Product of all elements, or along a certain axis."
if axis is None:
return Numeric.add.reduce(Numeric.ravel(filled(a, 0)))
else:
return Numeric.add.reduce(filled(a, 0), axis)
def average(a, axis=3DNone):
"Average of the nonmasked elements of a"
c =3D 1.0 * count(a, axis)
a =3D sum(a, axis)=20
return divide(a, c)
=20
NewAxis =3D Numeric.NewAxis
arange =3D Numeric.arange
ones =3D Numeric.ones
zeros =3D Numeric.zeros
array =3D Numeric.array
#minimum, maximum?
#searchsorted?
sqrt =3D masked_unary_operation(Numeric.sqrt, 0.0)
sin =3D masked_unary_operation(Numeric.sin, 0.0)
cos =3D masked_unary_operation(Numeric.cos, 0.0)
absolute =3D masked_unary_operation(Numeric.absolute, 0)
negative =3D masked_unary_operation(Numeric.negative, 0)
absolute =3D masked_unary_operation(Numeric.absolute, 0)
nonzero =3D masked_unary_operation(Numeric.nonzero, 0)
add =3D masked_binary_operation(Numeric.add, 0)
subtract =3D masked_binary_operation(Numeric.subtract, 0)
multiply =3D masked_binary_operation(Numeric.multiply, 1)
power =3D masked_binary_operation(Numeric.power, 1)
sometrue =3D masked_unary_operation(Numeric.sometrue, 0)
alltrue =3D masked_unary_operation(Numeric.alltrue, 1)
# all these comparisons return false where masks are true =20
equal =3D comparison(Numeric.equal, 0, 1)
not_equal =3D comparison(Numeric.not_equal, 0, 0)
less_equal =3D comparison(Numeric.less_equal, 1, 0)
greater_equal =3D comparison(Numeric.greater_equal, 0, 1)
less =3D comparison(Numeric.less, 1, 0)
greater =3D comparison(Numeric.greater, 0, 1)
def choose (condition, t):
"Shaped like condition, values t[0] where condition true, t[1] =
otherwise"
d =3D Numeric.choose(filled(condition,0), map(filled, t))
m =3D Numeric.choose(filled(condition,0), map(getmaskarray, t))
return MA(d, m)
def where (condition, x, y):
return choose(Numeric.not_equal(condition, 0), (y, x))
def reshape (a, newshape):
"Copy of a with a new shape."
m =3D getmask(a)
d =3D Numeric.reshape(filled(a), newshape)
if m is None:
return d
else:
return MA(d, Numeric.reshape(m, newshape))
def ravel (a):
"a as one-dimensional, may share data and mask"
m =3D getmask(a)
d =3D Numeric.ravel(filled(a)) =20
if m is None:
return d
else:
return MA(d, Numeric.ravel(m))
def concatenate (somearrays, axis=3D0):
"concatenate the arrays along the given axis"
d =3D Numeric.concatenate(map(filled, somearrays), axis)
dm =3D Numeric.concatenate(map(getmaskarray, somearrays), axis)
return MA(d, dm)
def take (a, indices, axis=3D0):
m =3D getmask(a)
d =3D filled(a)
if m is None:
return Numeric.take(d, indices, axis)
else:
return MA(Numeric.take(d, indices, axis),=20
Numeric.take(m, indices, axis))
=20
def divide (x, y):
"a/b, masked where b was masked or zero"
a =3D filled(x, 0)
b =3D filled(y, 1)
bad_elements =3D equal(b, 0)
c_mask =3D mask_or(mask_or(getmask(a), getmask(b)), bad_elements)
return MA(Numeric.divide(a, b), c_mask)
=20
# This section is stolen from a post about how to limit array printing.
def limitedArrayRepr(a, max_line_width =3D None, precision =3D None, =
suppress_small =3D None):
global __MaxElements
s =3D a.shape
elems =3D Numeric.multiply.reduce(s)
if elems > __MaxElements:
if len(s) > 1:
return 'array (%s) , type =3D %s, has %d elements' % \
(string.join(map(str, s), ","), a.typecode(), elems)
else:
return Numeric.array2string (a[:__MaxElements], =
max_line_width, precision,
suppress_small,',',0) + \
('\n + %d more elements' % (elems - __MaxElements))
else:
return Numeric.array2string (a, max_line_width, precision,=20
suppress_small,',',0)
# replace the default with a smarter function for huge arrays
original_array_repr =3D Numeric.array_repr
original_array_str =3D Numeric.array_str
Numeric.array_repr =3D limitedArrayRepr
Numeric.array_str =3D limitedArrayRepr
# fixup multiarray, as well.
import multiarray
multiarray.set_string_function(limitedArrayRepr, 0)
multiarray.set_string_function(limitedArrayRepr, 1)
if __name__ =3D=3D '__main__':
import MAtest
------=_NextPart_000_000B_01BF044B.6D03FF20
Content-Type: text/plain;
name="MAtest.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="MAtest.py"
from MA import *
def put(*stuff):
for x in stuff:
print x,
print
def test (va, mv):
ua =3D masked_array(va, mv)
put("shape(ua), ua.shape, ua.get_shape()", shape(ua), ua.shape, =
ua.get_shape())
put("rank(ua), size(ua), size(ua,0), size(ua,1)\n",=20
rank(ua), size(ua), size(ua,0), size(ua,1))
put("mask(ua)\n", getmask(ua))
put("repr(ua)\n", repr(ua))
put("va\n", va)
put("ua\n", ua)
put ("ua.filled(-1)\n", ua.filled(-1))
put("isMA(ua)", isMA(ua))
put("isMA(va)", isMA(va))
put("ua * va", type(ua*va), '\n', ua*va)
put("va * ua", type(va*ua), '\n', va*ua)
put("ua[:,2]\n", ua[:,2])
u1 =3D ravel(ua)
put("u1", u1.filled(), u1.mask())
=20
put("u1[7]\n", u1[7])
put("u1[4:12]\n", u1[4:12])
put("cos(ua)\n", cos(ua))
put("abssolute(-ua)\n", absolute(-ua))
put("ua+ua\n", ua+ua)
put("ua**2\n", ua**2)
put("sum(ua)\n", sum(ua))
put("sum(ua, 1)\n", sum(ua, 1))
put("equal(ua, va)\n", equal(ua, va))
put("average(ua)\n", average(ua))
put("average(ua, 1)\n", average(ua, 1))
put("take(ua, (1,2))\n", take(ua, (1,2)))
put("take(ua, (1,2), 1)\n", take(ua, (1,2), 1))
va=3Dreshape(arange(12), (4,3))
mv =3D 99
va[0,1]=3Dmv
va[2,2] =3D mv
test (va, mv)
va =3D reshape(arange(12)*1.0, (4,3))
mv =3D 1.e20
va[0,1] =3D mv
va[2,2] =3D mv
test (va, mv)
# now try pathology
va=3Dreshape(arange(12)*1.0, (4,3))
mv =3D 1.e20
va[0,1] =3D mv
va[2,2] =3D mv
va[:, 2] =3D mv
test (va, mv)
a =3D arange(4, typecode=3DFloat)
b =3D arange(4, typecode=3DFloat) -2.0
am =3D MA(a, [1,0,0,0])
bm =3D MA(b, [0,0,0,1])
put("am\n", am)
put("bm\n", bm)
put("a\n", a)
put("b\n", b)
put("divide(a, b)\n", divide(a, b))
put("am / b\n", am / b)
put("a / bm\n", a / bm)
put("am / bm\n", am / bm)
ua =3D masked_array(va, mv, copy=3D0)
ub =3D MA(va, ua.mask()) #shares a mask and data
ua[1,0] =3D 22.0
uc =3D MA(ua, ua.mask()) #shares only the mask
ua.mask()[0,0] =3D 1
put("ids\n", map(lambda x: x.ids(), (ua,ub,uc)))
put("3 tests of sharing, should be all zeros\n")
put("ub[1,0] - ua[1,0]\n", ub[1,0] - ua[1, 0])
put("ua.filled(10.) - ub.filled(10.)\n", ua.filled(10.) - =
ub.filled(10.))
put("ua - uc\n", ua - uc)
wo =3D where([1,1,0,0], am, am*10.)
wo.missing_value =3D 1.e20
put("wo\n", wo)
ns =3D arange(12, typecode=3DFloat)
ms =3D masked_array(ns, 5.0)
qs =3D masked_array(ns, 100.0)
ns =3D reshape(ns, (4,3))
ms.set_shape(4,3)
qs.set_shape(4,3)
put("Should be (4,3)\n", shape(ns))
put("Should be (4,3)\n", shape(ms))
put("Should be (4,3)\n", shape(qs))
------=_NextPart_000_000B_01BF044B.6D03FF20--