[Matrix-SIG] Numeric Nits

Mon, 21 Jun 1999 09:58:16 -0400 (EDT)

We are engaged in a project to use Python as the basis for a scripting
language for our existing data analysis software.  To date we have made
great progress, and recently we have begun looking at creating an
integrated environment that allows the use of Numeric along with our
old software.

In examining the Numeric capabilities, we see some significant
shortcomings that we think could be rectified in the standard
Numeric distribution.  We realize that at least some of these topics
have been raised before, but hope that this message will serve as a
starting point for resolving these apparent problems (if perhaps only
to show us that we are wrongheaded about what we perceive as
problems).  We realize some of our suggestions may be controversial.

Occasionally we have seen suggestions like ours met with responses 
something like, "If you want to do image processing and care about
efficiency, you shouldn't use Python/Numeric anyway."  We don't agree
with that, however -- image processing is in many cases the ideal
application for an interpreted language like Python, because nearly all
the compute time is spent doing vectorized calculations on millions of
pixels, and the interpreter overhead is often negligible.  Given the
right additions to Numeric (and not that many are needed), Python could
be competitive with any image/data processing language in existence for
efficiency (and better than any other as a programming language.)
We have a great deal of experience with IDL, and while the shortcomings
of the current version of Numeric keep it from being competitive with
IDL now, with a few changes Numeric could easily replace IDL for our
applications.

Here is a brief summary of the things we think are needed:

(1) Improved control over memory allocation and over promotion
    of arrays to larger types (e.g., Float32 to Float64).
(2) "put" function (inverse of take) to scatter values into a set of
    array elements.
(3) Improved or new standard functions:
    (a) C versions (for speed and memory efficiency) of more
        standard functions including: arrayrange, nonzero,
        compress, clip, where.
    (b) A new function to rebin an array, either reducing the array
        size by summing adjacent elements or increasing the size by
        repeating elements.
    (c) A new function to compute histograms for arrays using a memory-
        and CPU-efficient algorithm.

We discuss each of these in more detail below.

1. Memory allocation

A lot of our data analysis applications involve processing large
images.  There are astronomical cameras in use that produce images of
16k by 16k pixels (2**28 = 268 million pixels) or larger.  Handling
these large quantities of data is a challenge, and Numeric's tendency
to increase the number of bytes per pixel through datatype promotion
makes it even harder.

It is very difficult to keep arrays in their original datatype if they
do not correspond to python ints (32-bit) or floats (64-bit).
Operations with python literals or scalar values automatically upcast
the results.  Even operations with integers cause Float32 arrays to be
converted to Float64 -- e.g., ones(10,Float32)/2 is a Float64, which
is a very surprising result for users of other languages.

The suggested solution (using rank-0 Numeric arrays instead of simple
Python constants) often does not work because any Python operation on
those arrays turns them back into (promoted) Python scalars.  E.g., after
    a = array(1.0, Float32)
    b = -a
b is a standard Python 64-bit scalar rather than a Float32.

In the applications we have developed using Numeric, the majority of
our effort appears to be spent on trying to work around this promotion
behavior, and the resulting code is not very readable (and probably not
very maintainable either).  We wonder whether anyone has seriously used
Numeric for 16 bit ints or 32 bit floats in cases (like ours) where the
memory penalty of converting these to 32 bit ints and doubles is not
acceptable.

The nicest solution from the numerical point of view would be for
Python to support these other types.  We do not expect that to happen
(nor should it happen).

But the problem could be ameliorated by changing Numeric's default
behavior in casting constants.  If combining an array and a python
scalar of similar types (some form of int with some form of int, or
float with double or int), then the default behavior should be to cast the
python type to the array type, even if it means down casting.  
The rules for promotion of ints to floats/doubles would remain as is.

We consider this a very important issue.  The effect of the current
situation is to render Numeric types that have no corresponding Python
type virtually useless except for conversion purposes.

Another change that would make Numeric's memory use more efficient would
be better use of temporary arrays.  It is possible to avoid excess creation
of temporary arrays by using the 3-argument forms of multiply(), add(), etc.,
e.g., replacing
    b=a*(a+1)
by
    b=a+1
    multiply(a,b,b)
Not only does the second version take less memory, but it runs about
10% faster (presumably because it spends less time allocating memory.)

The question is, is there a way for Numeric to determine that one of
operands is a temporary?  It looks to us as if the reference count for
a temporary result (which will be discarded after the operation) is
smaller than for a variable operand.  If so, it should be possible for
Numeric to reuse the temporary for the output in cases where the
temporary is the same type and shape as the output.  Maybe there is
something tricky here that we don't understand, but if not it seems like
a fairly simple change would make Numeric a lot less memory-hungry.

2. Inverse of take function

We call this out as a separate item because we consider it essential to
make Numeric effective.  This topic has arisen repeatedly in the SIG,
yet nothing has come of it for various reasons which seem to include:

 1) a lack of agreement on what the general solution should be.
 2) no one willing to implement any solution, agreed upon or not.

The absence of an agreed upon general solution or someone to do the
work should not prevent the adoption and distribution of
less-than-general solutions (such as array_set, that comes with Gist)
with the standard distribution.  Surely a less-than-general solution is
preferable to none.

While an addition to the Numeric syntax that allows arrays to be
indexed by array is preferable (so that one can use an indexing syntax
rather than a function call), we can't see why the simple function call
can't be added right away even if it is not completely general and even
if it will ultimately be replaced by something more powerful and
convenient.

Perhaps progress on the more limited fronts may spur progress on the
most general solution.  While we of course would prefer someone else do
the work (partly because we do not yet consider ourselves competent to
muck around with Numeric), we are willing to do it ourselves if no one
else will. 

3. C implementations of standard functions

We were surprised that some of the most heavily used functions in
Numeric are implemented in Python rather than C, despite their apparent
simplicity.  At least some of these really need to be implemented in C
to make them decently efficient: arrayrange, nonzero, compress, clip,
where.  The current Python versions are especially costly in memory
(when working with large images) because they usually create several
temporary arrays before finally producing the result.

Note that the standard Python version of clip could be replaced by this
much more efficient version (which is about 10 times faster):

def clip(m, m_min, m_max):
    """clip(m, m_min, m_max) = every entry in m that is less than m_min is
    replaced by m_min, and every entry greater than m_max is replaced by
    m_max.
    """
    b = maximum(m, m_min)
    return minimum(b, m_max)

We also see a need for a function that rebins an array, e.g., zooming
the array by pixel replication or dezooming it by summing blocks of
pixels (not simply by slicing to select every N-th pixel.)  Both of
these would change the shape of the array by multiplying or dividing by
an integral factor.  These functions are very useful in image
processing; repeat provides a somewhat clumsy way to expand an array,
but it is not obvious how to efficiently reduce it.

There are several Python functions for computing histograms from
Numeric arrays, but there still seems to be a need for a simple,
general histogram function written in C for efficiency.  This means not
just CPU efficiency but also memory efficiency -- the Python versions
we have seen typically sort the array, which is costly in memory for
big images.

We think Numeric has great potential, and it would be a shame if the
scientific community is discouraged by the lack of a few features and
conveniences they have come to expect in array languages.

Rick White
Perry Greenfield
Paul Barrett

Space Telescope Science Institute