[Matrix-SIG] thoughts about the future

Paul F. Dubois dubois1@llnl.gov
Mon, 10 Jan 2000 12:22:24 -0800


This is a multi-part message in MIME format.

------=_NextPart_000_0000_01BF5B65.5A1AE600
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

There are two separate concepts and NumPy attempts to do both of them:

1. Provide an "array" object that can store a large number of items of one
type in as small a space as possible.

2. Provide for rapid operations and fancy subscripting and shapes, etc.

I believe (1) belongs in the Python core. I think they should be 1-D only
with basic math operations. Both Python and C APIs would allow you to create
such arrays and to create ones that do not "own" the memory in question,
similar to the options in NumPy. Along with this the : notation would be
allowed outside of square brackets, so that r= 3:1000:10 would be a "range"
object. This allows these subscripts as function arguments, enabling
Fortran-like syntax and many other interesting applications. It is a design
question whether there is one array type with a variable data type or
different array types. I think it is really a generic parameter to the class
but Python lacks such a concept.

One or more facilities can then be build on top of (1) that satisfy
different needs. I'm convinced that no one design will ever satisfy
everyone. Those who need blinding speed and no upcasting are unlikely to
ever get along with the "safe and easy to use for my customers" group. But
for a start we could reproduce the semantics we have now.

An illustration of my reasoning is the number of "bug reports" I receive
about the fact that NumPy crashes with this or that bad input. The original
designer did not test for bad input in order not to compromise performance.
Considering this from a Design by Contract perspective, most NumPy operators
have strong preconditions. This would be fine except they aren't stated
using assertions so there is no "debug mode". We should go through NumPy and
add the assertions.

Another illustration is the x[i:j] question: is this a copy or a reference?
For non-speed users, the present "reference" semantics are terrible. But you
can hide that in a (2) facility. For reference I attach a copy of a "missing
value" class (not completely done yet) that illustrates the approach.

There are also many schools of thought about shape facilities and
subscripting in general. I have my own opinions and I'm sure each of you
does too.

I grant that it will be a pity to have more than one competing high-level
package, but if all packages share the common core array as their storage
mechanisms it ought not to be too bad to pass things back and forth.



------=_NextPart_000_0000_01BF5B65.5A1AE600
Content-Type: text/plain;
	name="MA.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="MA.py"

"""MA facility Copyright 1999 Regents of the University of California.
Released for unlimited redistribution; see file Legal.htm in Numerical =
distribution."""
__version__ =3D 2.0
#This version gives up on the Extension Class stuff
import Numeric
from Precision import *
import string, types

class MAError (Exception):
    def __init__ (self, args=3DNone):
        self.args =3D args
    def __str__(self):
        return str(self.args)

__MaxElements =3D 300     #Maximum size for printing

def setMaxElements (m):
    "Set the maximum # of elements for printing arrays. "
    global __MaxElements
    n =3D int(m)
    if n <=3D 0: raise ValueError, "setMaxElements requires positive =
argument."
    __MaxElements =3D n

def getMaxElements ():
    "Get the maximum # of elements for printing arrays. "
    global __MaxElements
    return __MaxElements

def getmask (a):
    "Mask of values in a, if any; None otherwise."
    if isMA(a):=20
        return a.mask()
    else:=20
        return None

def getmaskarray (a):
    "Mask of values in a, ones if none."
    m =3D getmask(a)
    if m is None:
        return Numeric.ones(shape(a))
    else:
        return m

def mask_or (m1, m2):
    "Logical or of the masks m1 and m2, treating None as false."
    if m1 is None: return m2
    if m2 is None: return m1
    return Numeric.logical_or(m1, m2)

def filled (a, value =3D None):
    """
    a as a Numeric array with any masked areas replaced by value"
    No data copied if no masked area and data is already an array.
    """
    if isMA(a):
        return a.filled(value)
    else:
        return Numeric.array(a, copy=3D0)

class masked_unary_operation:
    def __init__ (self, aufunc, fill=3D0):
        self.f =3D aufunc
        self.fill =3D fill

    def __call__ (self, a):
        m =3D getmask(a)
        d1 =3D filled(a)
        if m is None:
            return self.f(d1)
        else:
            return MA(self.f(d1), m)

class masked_binary_operation:
    def __init__ (self, aufunc, fill=3D0):
        self.f =3D aufunc
        self.fill =3D fill

    def __call__ (self, a, b):
        d1 =3D filled(a, self.fill)
        d2 =3D filled(b, self.fill)
        m =3D mask_or(getmask(a), getmask(b))
        if m is None:
            return self.f(d1, d2)
        else:
            return MA(self.f(d1,d2), m)

    def reduce (self, target, axis=3D0):
        m =3D getmask(target)
        if m is None:
            return self.f.reduce (target, axis)
        else:
            t =3D target.filled(self.fill)
            tr =3D self.f.reduce(t, axis)
            m =3D Numeric.logical_and.reduce(m, axis)
            return MA(tr, m)

class comparison:
    "A comparison function that returns a plain mask after doing the =
comparison"
    def __init__ (self, f, fill1, fill2):
        self.f =3D f
        self.fill1 =3D fill1
        self.fill2 =3D fill2
    def __call__ (self, a, b):
        return self.f(filled(a, self.fill1), filled(b, self.fill2))


# class MA       =20
# the use of filled in the constructor
# makes sure we don't get MA's in data or mask=20
# This prevents recursion
# and we can assume these are array objects (or None for mask).
# Note current implementation assumes mask is 1's and 0's
# so that we can use Numeric.choose in filled.

class MA:
    "Arrays with possibly masked values."
    default_real_fill_value =3D 0.0
    default_complex_fill_value =3D 0.0 + 0.0j
    default_character_fill_value =3D ' '
    default_integer_fill_value =3D 0
    default_unsigned_integer_fill_value =3D 0
    default_object_fill_value =3D None

    def __init__(self, data, mask=3DNone, **keys):
        """MA(data, mask=3DNone)"""
        self.__data =3D filled(data)
        if mask is None:
            self.__mask =3D None
        else:
            self.__mask =3D filled(mask, 1)

    def __getattr__ (self, name):
        if name =3D=3D 'shape':
            return self.get_shape()
        else:
            return self.__dict__[name]

    def __setattr__ (self, name, value):
        if name =3D=3D 'shape':
            self.set_shape(value)
        else:
            self.__dict__[name] =3D value

    def __str__(self):
        return str(self.filled())

    def __repr__(self):
        if self.__mask is None:
            return "MA(" + repr(filled(self)) + ")"
        else:
            return "MA(" + repr(filled(self)) + ", \n   " + \
                           repr(self.__mask) + ")"

    def __float__(self):
        return MA (float(self.filled(0.0)), self.__mask)

    def __int__(self):
        return MA(int(self.filled(0)), self.__mask)

# Note copy semantics here differ from Numeric       =20
    def __getitem__(self, i):=20
        m =3D self.__mask
        if m is None:
            return Numeric.array(self.__data[i])
        else:
            return MA(Numeric.array(self.__data[i]), =
Numeric.array(m[i]))

    def __getslice__(self, i, j):=20
        m =3D self.__mask
        if m is None:
            return Numeric.array(self.__data[i:j])
        else:
            return MA(Numeric.array(self.__data[i:j]), =
Numeric.array(m[i:j]))
# --------
    def __setitem__(self, index, value):
        self.__data[index] =3D filled(value)
        self.__mask =3D mask_or(self.__mask, getmask(value))
=20
    def __setslice__(self, i, j, value):
        self.__data[i:j] =3D filled(value)
        m =3D getmask(value)
        if m is not None:
            sm =3D getmaskarray(self)
            sm[i:j] =3D mask_or(sm[i:j], m)
            self.__mask =3D sm

    def __len__ (self):
        "Return total number of elements in array."
        return self.size()

    def __cmp__ (self, other):
        raise MAError, "Cannot use comparison operators on MA =
instances."
=20
    def __abs__(self):=20
        return masked_unary_operation(Numeric.absolute)(self)

    def __neg__(self):=20
        return masked_unary_operation(Numeric.negative)(self)

    def __add__(self, other):=20
        return masked_binary_operation(Numeric.add)(self, other)
                       =20
    __radd__ =3D __add__

    def __sub__(self, other):=20
        return masked_binary_operation(Numeric.subtract,0)(self, other)

    def __rsub__(self, other):=20
        return masked_binary_operation(Numeric.subtract,0)(other, self)

    def __mul__(self, other):=20
        return masked_binary_operation(Numeric.multiply,0)(self, other)
   =20
    __rmul__ =3D __mul__

    def __div__(self, other):=20
        return divide(self, other)

    def __rdiv__(self, other):=20
        return divide(other, self)

    def __pow__(self,other, third=3DNone):=20
        if third is not None:=20
            raise MAException, "3 arg power unspoorted"
        return masked_binary_operation(Numeric.power, 1)(self, other)

    def __sqrt__(self):=20
        return masked_unary_operation(Numeric.sqrt,0)(self)

    def default_fill_value (self):
        "Function to calculate default fill value for an array."
        x =3D self.typecode()
        if x in typecodes['Float']:
            return MA.default_real_fill_value
        if x in typecodes['Integer']:
            return MA.default_integer_fill_value
        if x in typecodes['Complex']:
            return MA.default_complex_fill_value
        if x in typecodes['Character']:
            return MA.default_character_fill_value
        if x in typecodes['UnsignedInteger']:
            return Numeric.absolute(MA.default_integer_fill_value)
        else:
            return MA.default_object_fill_value

    def set_fill_value (self, v):
        "Set the fill value to v."
        self.fill_value =3D v

    def filled (self, value=3DNone):
        """ Self with masked data replaced by value.
            If value is None, self.fill_value used.
            self.default_fill_value () used if value is None.
            Not a copy if no masked data.
            Result is romised to be a Numeric.array.
        """
        if self.__mask is None:
            return self.__data
        else:
            if value is None:
                if hasattr(self, 'fill_value'):
                    v =3D self.fill_value
                else:
                    v =3D self.default_fill_value ()
                    self.fill_value =3D v
            else:
                v =3D value
            return Numeric.choose(self.__mask, (self.__data, v))

    def get_shape(self):
        "Return the tuple giving the current shape."
        return self.__data.shape

    def ids (self):
        """Return the ids of the data and mask areas"""
        return (id(self.__data), id(self.__mask))

    def mask(self):
        "Return the data mask, or None."
        return self.__mask
           =20
    def set_shape (self, *sizes):
        """shape(n, m, ...) sets the shape."""
        self.__data.shape =3D sizes
        if self.__mask is not None:
            self.__mask.shape =3D sizes
           =20
    def size (self, axis =3D None):
        "Number of elements in array, or in a particular axis."
        s =3D self.shape
        if axis is None:
            return reduce(Numeric.multiply, s, 1)
        else:
            return s[axis]
           =20
    def typecode(self):
        return self.__data.typecode()

# these are disgusting
    def is_contiguous (self):
        return self.data.is_contiguous()

    def byte_swapped(self):
        return self.data.byte_swapped()
   =20
def isMA (x):
    "Is x an instance of MA?"
    return isinstance(x, MA)

def masked_array (data, masked_value, typecode=3DNone, copy=3D1, =
atol=3D1.e-8, rtol=3D1.e-8, notest=3D0):
    """Create a masked array for numerical data; no mask unless =
necessary
       or if notest !=3D 0
    """
    abs =3D Numeric.absolute
    d =3D Numeric.array(data, typecode=3Dtypecode, copy=3Dcopy)
    dm =3D Numeric.less(abs(d-masked_value), atol+rtol*abs(d))
    if notest or Numeric.sometrue(Numeric.ravel(dm)):
        result =3D MA(d, dm)
    else:
        result =3D MA(d)
    result.set_fill_value(masked_value)
    return result
   =20

def masked_object (data, masked_value, copy=3D1):
    "Create a masked array of places where exactly equal to masked =
value"
    dm =3D Numeric.equal(data, masked_value, copy=3Dcopy)
    return MA(data, dm)
   =20
def rank (a):
    "Get the rank of a"
    return len(shape(a))
       =20
def shape (a):
    "Get the shape of a"
    if isMA (a):
        return a.get_shape()
    else:
        return Numeric.shape(a)

def size (a, axis=3DNone):
    "Get the number of elements in a, or along a certain axis."
    if isMA (a):
        return a.size(axis)
    else:
        s =3D Numeric.shape(a)
        if axis is None:
            return reduce(Numeric.multiply, s, 1)
        else:
            return s[axis]

def count (a, axis =3D None):
    "Count of the non-masked elements."  =20
    m =3D getmask(a)
    if m is None:
        return size(a, axis)
    else:
        n1 =3D size(a, axis)
        n2 =3D sum (m, axis)
        return n1 - n2

def sum (a, axis =3D None):
    "Sum of all elements, or along a certain axis"
    if axis is None:
        return Numeric.add.reduce(Numeric.ravel(filled(a, 0)))
    else:
        return Numeric.add.reduce(filled(a, 0), axis)
=20
def product (a, axis =3D None):
    "Product of all elements, or along a certain axis."
    if axis is None:
        return Numeric.add.reduce(Numeric.ravel(filled(a, 0)))
    else:
        return Numeric.add.reduce(filled(a, 0), axis)

def average(a, axis=3DNone):
    "Average of the nonmasked elements of a"
    c =3D 1.0 * count(a, axis)
    a =3D sum(a, axis)=20
    return divide(a, c)
   =20
NewAxis =3D Numeric.NewAxis
arange =3D Numeric.arange
ones =3D Numeric.ones
zeros =3D Numeric.zeros
array =3D Numeric.array

#minimum, maximum?
#searchsorted?

sqrt =3D masked_unary_operation(Numeric.sqrt, 0.0)
sin =3D masked_unary_operation(Numeric.sin, 0.0)
cos =3D masked_unary_operation(Numeric.cos, 0.0)
absolute =3D masked_unary_operation(Numeric.absolute, 0)
negative =3D masked_unary_operation(Numeric.negative, 0)
absolute =3D masked_unary_operation(Numeric.absolute, 0)
nonzero =3D masked_unary_operation(Numeric.nonzero, 0)

add =3D masked_binary_operation(Numeric.add, 0)
subtract =3D masked_binary_operation(Numeric.subtract, 0)
multiply =3D masked_binary_operation(Numeric.multiply, 1)
power =3D masked_binary_operation(Numeric.power, 1)

sometrue =3D masked_unary_operation(Numeric.sometrue, 0)
alltrue =3D masked_unary_operation(Numeric.alltrue, 1)

# all these comparisons return false where masks are true      =20
equal =3D comparison(Numeric.equal, 0, 1)
not_equal =3D comparison(Numeric.not_equal, 0, 0)
less_equal =3D comparison(Numeric.less_equal, 1, 0)
greater_equal =3D comparison(Numeric.greater_equal, 0, 1)
less =3D comparison(Numeric.less, 1, 0)
greater =3D comparison(Numeric.greater, 0, 1)

def choose (condition, t):
    "Shaped like condition, values t[0] where condition true, t[1] =
otherwise"
    d =3D Numeric.choose(filled(condition,0), map(filled, t))
    m =3D Numeric.choose(filled(condition,0), map(getmaskarray, t))
    return MA(d, m)

def where (condition, x, y):
    return choose(Numeric.not_equal(condition, 0), (y, x))

def reshape (a, newshape):
    "Copy of a with a new shape."
    m =3D getmask(a)
    d =3D Numeric.reshape(filled(a), newshape)
    if m is None:
        return d
    else:
        return MA(d, Numeric.reshape(m, newshape))

def ravel (a):
    "a as one-dimensional, may share data and mask"
    m =3D getmask(a)
    d =3D Numeric.ravel(filled(a))  =20
    if m is None:
        return d
    else:
        return MA(d, Numeric.ravel(m))

def concatenate (somearrays, axis=3D0):
    "concatenate the arrays along the given axis"
    d =3D Numeric.concatenate(map(filled, somearrays), axis)
    dm =3D Numeric.concatenate(map(getmaskarray, somearrays), axis)
    return MA(d, dm)

def take (a, indices, axis=3D0):
    m =3D getmask(a)
    d =3D filled(a)
    if m is None:
        return Numeric.take(d, indices, axis)
    else:
        return MA(Numeric.take(d, indices, axis),=20
                  Numeric.take(m, indices, axis))
                          =20
def divide (x, y):
    "a/b, masked where b was masked or zero"
    a =3D filled(x, 0)
    b =3D filled(y, 1)
    bad_elements =3D equal(b, 0)
    c_mask =3D mask_or(mask_or(getmask(a), getmask(b)), bad_elements)
    return MA(Numeric.divide(a, b), c_mask)
   =20
# This section is stolen from a post about how to limit array printing.

def limitedArrayRepr(a, max_line_width =3D None, precision =3D None, =
suppress_small =3D None):
    global __MaxElements
    s =3D a.shape
    elems =3D  Numeric.multiply.reduce(s)
    if elems > __MaxElements:
        if len(s) > 1:
            return 'array (%s) , type =3D %s, has %d elements' % \
                 (string.join(map(str, s), ","), a.typecode(), elems)
        else:
            return Numeric.array2string (a[:__MaxElements], =
max_line_width, precision,
                 suppress_small,',',0) + \
               ('\n + %d more elements' % (elems - __MaxElements))
    else:
        return Numeric.array2string (a, max_line_width, precision,=20
                suppress_small,',',0)


# replace the default with a smarter function for huge arrays
original_array_repr =3D Numeric.array_repr
original_array_str =3D Numeric.array_str
Numeric.array_repr =3D limitedArrayRepr
Numeric.array_str =3D limitedArrayRepr

# fixup multiarray, as well.
import multiarray
multiarray.set_string_function(limitedArrayRepr, 0)
multiarray.set_string_function(limitedArrayRepr, 1)

if __name__ =3D=3D '__main__':
        import MAtest

------=_NextPart_000_0000_01BF5B65.5A1AE600--