[Matrix-SIG] Re: Numeric plans

Tue, 21 Sep 1999 16:07:38 -0700

This is a multi-part message in MIME format.

------=_NextPart_000_000B_01BF044B.6D03FF20
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Perry et. al asked me for comments on the following:

> 1) "[Numeric] C code is a mess." (David Asher)

I invite you to peruse the source. Imagine you want to make the change
requested in #2.
I believe you will then hold this truth to be self-evident.

>
> 2) Paul Dubois mentioned the problems he was having settling on a
> consensus for how the "inverse of take" (the old scatter/gather issue)
> should be implemented when the array being indexed has rank > 1.
...
> Paul,
> is it possible to summarize this?

Paul has a day job. That is why he asked for proposals. If your proposal is
to implement x[indices] as both an r- and l-value for x one-dimensional
only, then by golly I do know what to do. But I don't necessarily know how
to do it, see #1. Egads, a circular reference, I will never be garbage
collected.

>
> 3) Are there aspects of this work (UserArray, general code clean-ups,
> or inverse of take) that others can help contribute to? If not...

People have been contributing when the spirit moved them by sending in
patches. David or I review the patches and as long as it doesn't offend our
sensibilities we put them in. We'd like to get #12 fixed after our screwup
and then would welcome things such as #2 being done for us. We both have a
lot on our plates so we are in the annoying position of locking out the
volunteer help for the moment because we are too busy.

>
> 4) A means of inheriting Numeric arrays is important to us also and we
> are very interested in how that issue is going to be solved. Is there
> any way for those of us that are interested to keep abreast of the ongoing
> design and implementation discussions?

David and I finally got a chance to talk together in person for a couple of
hours and we had some ideas about how to proceed. That is the sum total of
the "design". The general idea is to make a decent UserArray; this will
require adding some things to the core. We got a lot of experience during
the recent wild goose chase. I've been working on a facility for what I call
Masked Arrays; that is, arrays that have an optional mask that indicates
missing values. This *is* part of my day job so it is getting into pretty
good shape now. I was going to back-fit UserArray from it. I attach the
current version for your comments. This incarnation tries to let Numeric
arrays and these MA arrays coexist.
>
> Perry Greenfield
> Rick White
> Paul Barrett
>
Supposing there were a full-featured UserArray module, similar to MA but
without the mask, would it be helpful to you? Or perhaps, would it be
helpful as a reuse-by-editor form of inheritance? I guess what I think I
found out is that it is almost not really possible to inherit from Numeric
without fairly massive redefinitions. There are some tricks one might be
able to play to make a few extensions possible, such as using self.__class__
as a constructor where possible, but maybe even that isn't a good idea since
it assumes you can make a valid Foo using just data as an argument, clearly
not true in general.

For the longer term, it would be nice to get a do-over. However, there are
many different communities here with different needs and few social skills
with which to resolve the differences. (:->.  This fact is easiest to see if
you think about x[i, :]. The speed burners want this to be a reference, the
usabilities want this to be a copy.

Best regards,

Paul

------=_NextPart_000_000B_01BF044B.6D03FF20
Content-Type: text/plain;
	name="MA.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="MA.py"

"""MA facility Copyright 1999 Regents of the University of California.
Released for unlimited redistribution; see file Legal.htm in Numerical =
distribution."""
__version__ =3D 2.0
#This version gives up on the Extension Class stuff
import Numeric
from Precision import *
import string, types

class MAError (Exception):
    def __init__ (self, args=3DNone):
        self.args =3D args
    def __str__(self):
        return str(self.args)

__MaxElements =3D 300     #Maximum size for printing

def setMaxElements (m):
    "Set the maximum # of elements for printing arrays. "
    global __MaxElements
    n =3D int(m)
    if n <=3D 0: raise ValueError, "setMaxElements requires positive =
argument."
    __MaxElements =3D n

def getMaxElements ():
    "Get the maximum # of elements for printing arrays. "
    global __MaxElements
    return __MaxElements

def getmask (a):
    "Mask of values in a, if any; None otherwise."
    if isMA(a):=20
        return a.mask()
    else:=20
        return None

def getmaskarray (a):
    "Mask of values in a, ones if none."
    m =3D getmask(a)
    if m is None:
        return Numeric.ones(shape(a))
    else:
        return m

def mask_or (m1, m2):
    "Logical or of the masks m1 and m2, treating None as false."
    if m1 is None: return m2
    if m2 is None: return m1
    return Numeric.logical_or(m1, m2)

def filled (a, value =3D None):
    """
    a as a Numeric array with any masked areas replaced by value"
    No data copied if no masked area and data is already an array.
    """
    if isMA(a):
        return a.filled(value)
    else:
        return Numeric.array(a, copy=3D0)

class masked_unary_operation:
    def __init__ (self, aufunc, fill=3D0):
        self.f =3D aufunc
        self.fill =3D fill

    def __call__ (self, a):
        m =3D getmask(a)
        d1 =3D filled(a)
        if m is None:
            return self.f(d1)
        else:
            return MA(self.f(d1), m)

class masked_binary_operation:
    def __init__ (self, aufunc, fill=3D0):
        self.f =3D aufunc
        self.fill =3D fill

    def __call__ (self, a, b):
        d1 =3D filled(a, self.fill)
        d2 =3D filled(b, self.fill)
        m =3D mask_or(getmask(a), getmask(b))
        if m is None:
            return self.f(d1, d2)
        else:
            return MA(self.f(d1,d2), m)

    def reduce (self, target, axis=3D0):
        m =3D getmask(target)
        if m is None:
            return self.f.reduce (target, axis)
        else:
            t =3D target.filled(self.fill)
            tr =3D self.f.reduce(t, axis)
            m =3D Numeric.logical_and.reduce(m, axis)
            return MA(tr, m)

class comparison:
    "A comparison function that returns a plain mask after doing the =
comparison"
    def __init__ (self, f, fill1, fill2):
        self.f =3D f
        self.fill1 =3D fill1
        self.fill2 =3D fill2
    def __call__ (self, a, b):
        return self.f(filled(a, self.fill1), filled(b, self.fill2))

# class MA       =20
# the use of filled in the constructor
# makes sure we don't get MA's in data or mask=20
# This prevents recursion
# and we can assume these are array objects (or None for mask).
# Note current implementation assumes mask is 1's and 0's
# so that we can use Numeric.choose in filled.

class MA:
    "Arrays with possibly masked values."
    default_real_fill_value =3D 0.0
    default_complex_fill_value =3D 0.0 + 0.0j
    default_character_fill_value =3D ' '
    default_integer_fill_value =3D 0
    default_unsigned_integer_fill_value =3D 0
    default_object_fill_value =3D None

    def __init__(self, data, mask=3DNone, **keys):
        """MA(data, mask=3DNone)"""
        self.__data =3D filled(data)
        if mask is None:
            self.__mask =3D None
        else:
            self.__mask =3D filled(mask, 1)

    def __getattr__ (self, name):
        if name =3D=3D 'shape':
            return self.get_shape()
        else:
            return self.__dict__[name]

    def __setattr__ (self, name, value):
        if name =3D=3D 'shape':
            self.set_shape(value)
        else:
            self.__dict__[name] =3D value

    def __str__(self):
        return str(self.filled())

    def __repr__(self):
        if self.__mask is None:
            return "MA(" + repr(filled(self)) + ")"
        else:
            return "MA(" + repr(filled(self)) + ", \n   " + \
                           repr(self.__mask) + ")"

    def __float__(self):
        return MA (float(self.filled(0.0)), self.__mask)

    def __int__(self):
        return MA(int(self.filled(0)), self.__mask)

# Note copy semantics here differ from Numeric       =20
    def __getitem__(self, i):=20
        m =3D self.__mask
        if m is None:
            return Numeric.array(self.__data[i])
        else:
            return MA(Numeric.array(self.__data[i]), =
Numeric.array(m[i]))

    def __getslice__(self, i, j):=20
        m =3D self.__mask
        if m is None:
            return Numeric.array(self.__data[i:j])
        else:
            return MA(Numeric.array(self.__data[i:j]), =
Numeric.array(m[i:j]))
# --------
    def __setitem__(self, index, value):
        self.__data[index] =3D filled(value)
        self.__mask =3D mask_or(self.__mask, getmask(value))
=20
    def __setslice__(self, i, j, value):
        self.__data[i:j] =3D filled(value)
        m =3D getmask(value)
        if m is not None:
            sm =3D getmaskarray(self)
            sm[i:j] =3D mask_or(sm[i:j], m)
            self.__mask =3D sm

    def __len__ (self):
        "Return total number of elements in array."
        return self.size()

    def __cmp__ (self, other):
        raise MAError, "Cannot use comparison operators on MA =
instances."
=20
    def __abs__(self):=20
        return masked_unary_operation(Numeric.absolute)(self)

    def __neg__(self):=20
        return masked_unary_operation(Numeric.negative)(self)

    def __add__(self, other):=20
        return masked_binary_operation(Numeric.add)(self, other)
                       =20
    __radd__ =3D __add__

    def __sub__(self, other):=20
        return masked_binary_operation(Numeric.subtract,0)(self, other)

    def __rsub__(self, other):=20
        return masked_binary_operation(Numeric.subtract,0)(other, self)

    def __mul__(self, other):=20
        return masked_binary_operation(Numeric.multiply,0)(self, other)
   =20
    __rmul__ =3D __mul__

    def __div__(self, other):=20
        return divide(self, other)

    def __rdiv__(self, other):=20
        return divide(other, self)

    def __pow__(self,other, third=3DNone):=20
        if third is not None:=20
            raise MAException, "3 arg power unspoorted"
        return masked_binary_operation(Numeric.power, 1)(self, other)

    def __sqrt__(self):=20
        return masked_unary_operation(Numeric.sqrt,0)(self)

    def default_fill_value (self):
        "Function to calculate default fill value for an array."
        x =3D self.typecode()
        if x in typecodes['Float']:
            return MA.default_real_fill_value
        if x in typecodes['Integer']:
            return MA.default_integer_fill_value
        if x in typecodes['Complex']:
            return MA.default_complex_fill_value
        if x in typecodes['Character']:
            return MA.default_character_fill_value
        if x in typecodes['UnsignedInteger']:
            return Numeric.absolute(MA.default_integer_fill_value)
        else:
            return MA.default_object_fill_value

    def set_fill_value (self, v):
        "Set the fill value to v."
        self.fill_value =3D v

    def filled (self, value=3DNone):
        """ Self with masked data replaced by value.
            If value is None, self.fill_value used.
            self.default_fill_value () used if value is None.
            Not a copy if no masked data.
            Result is romised to be a Numeric.array.
        """
        if self.__mask is None:
            return self.__data
        else:
            if value is None:
                if hasattr(self, 'fill_value'):
                    v =3D self.fill_value
                else:
                    v =3D self.default_fill_value ()
                    self.fill_value =3D v
            else:
                v =3D value
            return Numeric.choose(self.__mask, (self.__data, v))

    def get_shape(self):
        "Return the tuple giving the current shape."
        return self.__data.shape

    def ids (self):
        """Return the ids of the data and mask areas"""
        return (id(self.__data), id(self.__mask))

    def mask(self):
        "Return the data mask, or None."
        return self.__mask
           =20
    def set_shape (self, *sizes):
        """shape(n, m, ...) sets the shape."""
        self.__data.shape =3D sizes
        if self.__mask is not None:
            self.__mask.shape =3D sizes
           =20
    def size (self, axis =3D None):
        "Number of elements in array, or in a particular axis."
        s =3D self.shape
        if axis is None:
            return reduce(Numeric.multiply, s, 1)
        else:
            return s[axis]
           =20
    def typecode(self):
        return self.__data.typecode()

# these are disgusting
    def is_contiguous (self):
        return self.data.is_contiguous()

    def byte_swapped(self):
        return self.data.byte_swapped()
   =20
def isMA (x):
    "Is x an instance of MA?"
    return isinstance(x, MA)

def masked_array (data, masked_value, typecode=3DNone, copy=3D1, =
atol=3D1.e-8, rtol=3D1.e-8, notest=3D0):
    """Create a masked array for numerical data; no mask unless =
necessary
       or if notest !=3D 0
    """
    abs =3D Numeric.absolute
    d =3D Numeric.array(data, typecode=3Dtypecode, copy=3Dcopy)
    dm =3D Numeric.less(abs(d-masked_value), atol+rtol*abs(d))
    if notest or Numeric.sometrue(Numeric.ravel(dm)):
        result =3D MA(d, dm)
    else:
        result =3D MA(d)
    result.set_fill_value(masked_value)
    return result
   =20

def masked_object (data, masked_value, copy=3D1):
    "Create a masked array of places where exactly equal to masked =
value"
    dm =3D Numeric.equal(data, masked_value, copy=3Dcopy)
    return MA(data, dm)
   =20
def rank (a):
    "Get the rank of a"
    return len(shape(a))
       =20
def shape (a):
    "Get the shape of a"
    if isMA (a):
        return a.get_shape()
    else:
        return Numeric.shape(a)

def size (a, axis=3DNone):
    "Get the number of elements in a, or along a certain axis."
    if isMA (a):
        return a.size(axis)
    else:
        s =3D Numeric.shape(a)
        if axis is None:
            return reduce(Numeric.multiply, s, 1)
        else:
            return s[axis]

def count (a, axis =3D None):
    "Count of the non-masked elements."  =20
    m =3D getmask(a)
    if m is None:
        return size(a, axis)
    else:
        n1 =3D size(a, axis)
        n2 =3D sum (m, axis)
        return n1 - n2

def sum (a, axis =3D None):
    "Sum of all elements, or along a certain axis"
    if axis is None:
        return Numeric.add.reduce(Numeric.ravel(filled(a, 0)))
    else:
        return Numeric.add.reduce(filled(a, 0), axis)
=20
def product (a, axis =3D None):
    "Product of all elements, or along a certain axis."
    if axis is None:
        return Numeric.add.reduce(Numeric.ravel(filled(a, 0)))
    else:
        return Numeric.add.reduce(filled(a, 0), axis)

def average(a, axis=3DNone):
    "Average of the nonmasked elements of a"
    c =3D 1.0 * count(a, axis)
    a =3D sum(a, axis)=20
    return divide(a, c)
   =20
NewAxis =3D Numeric.NewAxis
arange =3D Numeric.arange
ones =3D Numeric.ones
zeros =3D Numeric.zeros
array =3D Numeric.array

#minimum, maximum?
#searchsorted?

sqrt =3D masked_unary_operation(Numeric.sqrt, 0.0)
sin =3D masked_unary_operation(Numeric.sin, 0.0)
cos =3D masked_unary_operation(Numeric.cos, 0.0)
absolute =3D masked_unary_operation(Numeric.absolute, 0)
negative =3D masked_unary_operation(Numeric.negative, 0)
absolute =3D masked_unary_operation(Numeric.absolute, 0)
nonzero =3D masked_unary_operation(Numeric.nonzero, 0)

add =3D masked_binary_operation(Numeric.add, 0)
subtract =3D masked_binary_operation(Numeric.subtract, 0)
multiply =3D masked_binary_operation(Numeric.multiply, 1)
power =3D masked_binary_operation(Numeric.power, 1)

sometrue =3D masked_unary_operation(Numeric.sometrue, 0)
alltrue =3D masked_unary_operation(Numeric.alltrue, 1)

# all these comparisons return false where masks are true      =20
equal =3D comparison(Numeric.equal, 0, 1)
not_equal =3D comparison(Numeric.not_equal, 0, 0)
less_equal =3D comparison(Numeric.less_equal, 1, 0)
greater_equal =3D comparison(Numeric.greater_equal, 0, 1)
less =3D comparison(Numeric.less, 1, 0)
greater =3D comparison(Numeric.greater, 0, 1)

def choose (condition, t):
    "Shaped like condition, values t[0] where condition true, t[1] =
otherwise"
    d =3D Numeric.choose(filled(condition,0), map(filled, t))
    m =3D Numeric.choose(filled(condition,0), map(getmaskarray, t))
    return MA(d, m)

def where (condition, x, y):
    return choose(Numeric.not_equal(condition, 0), (y, x))

def reshape (a, newshape):
    "Copy of a with a new shape."
    m =3D getmask(a)
    d =3D Numeric.reshape(filled(a), newshape)
    if m is None:
        return d
    else:
        return MA(d, Numeric.reshape(m, newshape))

def ravel (a):
    "a as one-dimensional, may share data and mask"
    m =3D getmask(a)
    d =3D Numeric.ravel(filled(a))  =20
    if m is None:
        return d
    else:
        return MA(d, Numeric.ravel(m))

def concatenate (somearrays, axis=3D0):
    "concatenate the arrays along the given axis"
    d =3D Numeric.concatenate(map(filled, somearrays), axis)
    dm =3D Numeric.concatenate(map(getmaskarray, somearrays), axis)
    return MA(d, dm)

def take (a, indices, axis=3D0):
    m =3D getmask(a)
    d =3D filled(a)
    if m is None:
        return Numeric.take(d, indices, axis)
    else:
        return MA(Numeric.take(d, indices, axis),=20
                  Numeric.take(m, indices, axis))
                          =20
def divide (x, y):
    "a/b, masked where b was masked or zero"
    a =3D filled(x, 0)
    b =3D filled(y, 1)
    bad_elements =3D equal(b, 0)
    c_mask =3D mask_or(mask_or(getmask(a), getmask(b)), bad_elements)
    return MA(Numeric.divide(a, b), c_mask)
   =20
# This section is stolen from a post about how to limit array printing.

def limitedArrayRepr(a, max_line_width =3D None, precision =3D None, =
suppress_small =3D None):
    global __MaxElements
    s =3D a.shape
    elems =3D  Numeric.multiply.reduce(s)
    if elems > __MaxElements:
        if len(s) > 1:
            return 'array (%s) , type =3D %s, has %d elements' % \
                 (string.join(map(str, s), ","), a.typecode(), elems)
        else:
            return Numeric.array2string (a[:__MaxElements], =
max_line_width, precision,
                 suppress_small,',',0) + \
               ('\n + %d more elements' % (elems - __MaxElements))
    else:
        return Numeric.array2string (a, max_line_width, precision,=20
                suppress_small,',',0)

# replace the default with a smarter function for huge arrays
original_array_repr =3D Numeric.array_repr
original_array_str =3D Numeric.array_str
Numeric.array_repr =3D limitedArrayRepr
Numeric.array_str =3D limitedArrayRepr

# fixup multiarray, as well.
import multiarray
multiarray.set_string_function(limitedArrayRepr, 0)
multiarray.set_string_function(limitedArrayRepr, 1)

if __name__ =3D=3D '__main__':
        import MAtest

------=_NextPart_000_000B_01BF044B.6D03FF20
Content-Type: text/plain;
	name="MAtest.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="MAtest.py"

from MA import *
def put(*stuff):
    for x in stuff:
        print x,
    print

def test (va, mv):
    ua =3D masked_array(va, mv)
    put("shape(ua), ua.shape, ua.get_shape()", shape(ua), ua.shape, =
ua.get_shape())
    put("rank(ua), size(ua), size(ua,0), size(ua,1)\n",=20
         rank(ua), size(ua), size(ua,0), size(ua,1))
    put("mask(ua)\n", getmask(ua))
    put("repr(ua)\n", repr(ua))
    put("va\n", va)
    put("ua\n", ua)
    put ("ua.filled(-1)\n", ua.filled(-1))
    put("isMA(ua)", isMA(ua))
    put("isMA(va)", isMA(va))
    put("ua * va", type(ua*va), '\n', ua*va)
    put("va * ua", type(va*ua), '\n', va*ua)

    put("ua[:,2]\n", ua[:,2])
    u1 =3D ravel(ua)
    put("u1", u1.filled(), u1.mask())
   =20
    put("u1[7]\n", u1[7])
    put("u1[4:12]\n", u1[4:12])

    put("cos(ua)\n", cos(ua))
    put("abssolute(-ua)\n", absolute(-ua))
    put("ua+ua\n", ua+ua)
    put("ua**2\n", ua**2)
    put("sum(ua)\n", sum(ua))
    put("sum(ua, 1)\n", sum(ua, 1))
    put("equal(ua, va)\n", equal(ua, va))
    put("average(ua)\n", average(ua))
    put("average(ua, 1)\n", average(ua, 1))
    put("take(ua, (1,2))\n", take(ua, (1,2)))
    put("take(ua, (1,2), 1)\n", take(ua, (1,2), 1))

va=3Dreshape(arange(12), (4,3))
mv =3D 99
va[0,1]=3Dmv
va[2,2] =3D mv
test (va, mv)

va =3D reshape(arange(12)*1.0, (4,3))
mv =3D 1.e20
va[0,1] =3D mv
va[2,2] =3D mv
test (va, mv)

# now try pathology
va=3Dreshape(arange(12)*1.0, (4,3))
mv =3D 1.e20
va[0,1] =3D mv
va[2,2] =3D mv
va[:, 2] =3D mv
test (va, mv)

a =3D arange(4, typecode=3DFloat)
b =3D arange(4, typecode=3DFloat) -2.0
am =3D MA(a, [1,0,0,0])
bm =3D MA(b, [0,0,0,1])
put("am\n", am)
put("bm\n", bm)
put("a\n", a)
put("b\n", b)
put("divide(a, b)\n", divide(a, b))
put("am / b\n", am / b)
put("a / bm\n", a / bm)
put("am / bm\n", am / bm)

ua =3D masked_array(va, mv, copy=3D0)
ub =3D MA(va, ua.mask()) #shares a mask and data
ua[1,0] =3D 22.0
uc =3D MA(ua, ua.mask()) #shares only the mask
ua.mask()[0,0] =3D 1
put("ids\n", map(lambda x: x.ids(), (ua,ub,uc)))
put("3 tests of sharing, should be all zeros\n")
put("ub[1,0] - ua[1,0]\n", ub[1,0] - ua[1, 0])
put("ua.filled(10.) - ub.filled(10.)\n", ua.filled(10.) - =
ub.filled(10.))
put("ua - uc\n", ua - uc)
wo =3D where([1,1,0,0], am, am*10.)
wo.missing_value =3D 1.e20
put("wo\n", wo)
ns =3D arange(12, typecode=3DFloat)
ms =3D masked_array(ns, 5.0)
qs =3D masked_array(ns, 100.0)
ns =3D reshape(ns, (4,3))
ms.set_shape(4,3)
qs.set_shape(4,3)
put("Should be (4,3)\n", shape(ns))
put("Should be (4,3)\n", shape(ms))
put("Should be (4,3)\n", shape(qs))

------=_NextPart_000_000B_01BF044B.6D03FF20--