[Matrix-SIG] Numeric Nits

Konrad Hinsen hinsen@cnrs-orleans.fr
Tue, 22 Jun 1999 12:41:33 +0200


> Occasionally we have seen suggestions like ours met with responses 
> something like, "If you want to do image processing and care about
> efficiency, you shouldn't use Python/Numeric anyway."  We don't agree
> with that, however -- image processing is in many cases the ideal
> application for an interpreted language like Python, because nearly all
> the compute time is spent doing vectorized calculations on millions of

Did you try the Python Imaging Library (PIL)? It contains specialized
array-like objects for dealing with images. Ideally NumPy and PIL
would use a common low-level array object, but we don't live in a
perfect world!

> It is very difficult to keep arrays in their original datatype if they
> do not correspond to python ints (32-bit) or floats (64-bit).

Indeed. Unfortunately there is a conflict of interest between
different applications and different types of users. NumPy's current
behaviour is as compatible as possible with the behaviour of standard
Python data types, which I think is an important advantage.

> The suggested solution (using rank-0 Numeric arrays instead of simple
> Python constants) often does not work because any Python operation on
> those arrays turns them back into (promoted) Python scalars.  E.g., after
>     a = array(1.0, Float32)
>     b = -a
> b is a standard Python 64-bit scalar rather than a Float32.

That sounds like a problem that can be solved easily. Instead of using
rank-0 arrays, one could use special "Float32 Python scalars", which
would of course have to be implemented, presumably in NumPy, but that
is not much work. You would still have to create such objects with
somewhat clumsy syntax (e.g. Float32(3.5)), because extension modules
can't introduce new syntax rules, but there would be no more problems
with constant upcasting once you have made sure that all data is
in some float32 type.

Another solution to the upcasting problem could be the introduction of
a variant of Float32 arrays that are higher up in the upcasting
hierarchy than Float64. Then mixed expressions would automatically end
up in Float32. Whoever uses this type would of course have to watch
out for precision loss, but it seems to me that in your kind of
application this is not a problem.

> very maintainable either).  We wonder whether anyone has seriously used
> Numeric for 16 bit ints or 32 bit floats in cases (like ours) where the
> memory penalty of converting these to 32 bit ints and doubles is not
> acceptable.

Probably not!

> The nicest solution from the numerical point of view would be for
> Python to support these other types.  We do not expect that to happen
> (nor should it happen).

Why not? As long as you don't expect to get special syntax for
literals of the new type, this looks like a relatively simple
solution (see above).

> But the problem could be ameliorated by changing Numeric's default
> behavior in casting constants.  If combining an array and a python
> scalar of similar types (some form of int with some form of int, or
> float with double or int), then the default behavior should be to cast the
> python type to the array type, even if it means down casting.  

But this would be incompatible with both the current NumPy behaviour
(possibly breaking existing code) and with standard Python behaviour,
causing headaches to new users. I think that the principle "never lose
precision without clearly saying that you want it" is a good one.

> The question is, is there a way for Numeric to determine that one of
> operands is a temporary?  It looks to us as if the reference count for
> a temporary result (which will be discarded after the operation) is
> smaller than for a variable operand.  If so, it should be possible for
> Numeric to reuse the temporary for the output in cases where the
> temporary is the same type and shape as the output.  Maybe there is
> something tricky here that we don't understand, but if not it seems like
> a fairly simple change would make Numeric a lot less memory-hungry.

Working with reference counts is almost by definition something tricky.
I won't comment on this idea (which looks interesting, but too
difficult to verify quickly), but why don't you try it out in plain
Python, by creating a Python wrapper around array objects which
implements the optimizations you propose? (You can get the reference
count via the function sys.getrefcount().)

[about "put" function]

> The absence of an agreed upon general solution or someone to do the
> work should not prevent the adoption and distribution of
> less-than-general solutions (such as array_set, that comes with Gist)
> with the standard distribution.  Surely a less-than-general solution is
> preferable to none.

Except that it would create a future obligation to maintain
compatibility. Perhaps NumPy should contain a special module with
"quick and dirty" code which might be replaced by incompatible
improvements in the future.

> We were surprised that some of the most heavily used functions in
> Numeric are implemented in Python rather than C, despite their apparent
> simplicity.  At least some of these really need to be implemented in C
> to make them decently efficient: arrayrange, nonzero, compress, clip,
> where.  The current Python versions are especially costly in memory

Perhaps no one else ever had efficiency problems with them; I
certainly didn't!

> We also see a need for a function that rebins an array, e.g., zooming
> the array by pixel replication or dezooming it by summing blocks of
> pixels (not simply by slicing to select every N-th pixel.)  Both of

The first can be done by repeat() (at least along one axis). The second
problem may be what the mysterious reduceat() operation does, although
nobody seems to understand it well enough to decide!

But liks in any case of presumed missing functionality, I am sure that
a code contribution would be happily accepted by the maintainers...

> an integral factor.  These functions are very useful in image
> processing; repeat provides a somewhat clumsy way to expand an array,
> but it is not obvious how to efficiently reduce it.

How about reshaping and adding? For a 1D array for example:

  a = array([2, 3, 1, 7, 3, -8])
  b = add.reduce(reshape(a, (3, 2)), -1)

> There are several Python functions for computing histograms from
> Numeric arrays, but there still seems to be a need for a simple,
> general histogram function written in C for efficiency.  This means not

I have never found a need for a C function. Please try the histogram
class in version 2 of my ScientificPython package (available at
ftp://dirac.cnrs-orleans.fr/pub/); it does not sort the array
and doesn't create huge intermediates. It has performed well for
all my applications.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------