[Numpy-discussion] Re: [Matrix-SIG] An Experiment in code-cleanup.

Tue Feb 8 15:10:39 EST 2000

Travis Oliphant writes:
 > > 
 > > 1) The re-use of temporary arrays -- to conserve memory.
 > 
 > Please elaborate about this request.

When Python evaluates the expression:

>>> Y = B*X + A

where A, B, X, and Y are all arrays, B*X creates a temporary array, T.
A new array, Y, will be created to hold the result of T + A, and T
will be deleted.  If T and Y have the same shape and typecode, then
instead of creating Y, T can be re-used to conserve memory.

 > > 
 > > 2) A copy-on-write option -- to enhance performance.
 > > 
 > 
 > I need more explanation of this as well.

This would be an advanced feature of arrays that use memory-mapping or 
access their arrays from disk.  It is similar to the secondary cache
of a CPU.  The data is held in memory until a write request is made.

 > >
 > > 3) The initialization of arrays by default -- to help novices.
 > 
 > What kind of initialization are you taking about (we have zeros and ones
 > and random already).

For mixed-type (or object) arrays containing strings, zeros() and
ones() would be confusing.  Therefore by default, integer and floating
types are initialized to 0 and string types to ' ', and the option
would be available to not initialize the array for performance.

 > > 
 > > 4) The creation of a standard API -- which I guess is assumed, if it
 > >    is to be part of the Python standard distribution.
 > 
 > Any suggestions as to what needs to be changed in the already somewhat
 > standard API.

No, not exactly.  But the last time I looked, I thought some
improvements could be made to it.

 > > 
 > > 5) The inclusion of IEEE support.
 > 
 > This was supposed to be there from the beginning, but it didn't get
 > finished.  Jim's original idea was to have two math modules, one which
 > checked and gave error's for 1/0 and another that returned IEEE inf for
 > 1/0. 
 > 
 > The current umath does both with different types which is annoying. 

When I last spoke to Jim about this at IPC6, I was under the
impression that IEEE support was not fully implemented and much work 
still needed to be done.  Has this situation changed since then?

 > > 
 > >    And
 > > 
 > > 6) Enhanced support for mixed-types or objects.
 > > 
 > > This last issue is very import to me and the astronomical community,
 > > since we routinely store data as (multi-dimensional) arrays of fixed
 > > length records or C-structures.  A current deficiency of NumPy is that
 > > the object typecode does not work with the fromstring() method, so
 > > importing arrays of records from a binary file is just not possible.
 > > I've been developing my own C-extension type to handle this situation
 > > and have come to realize that my record type is really just a
 > > generalization of NumPy's types.  
 > 
 > 
 > I would like to see the code for your generalized type which would help me
 > see if there were some relatively painless way the two could be merged.

recordmodule.c is part of my PyFITS module for dealing with FITS
files.  You can find it here:

   ftp://ra.stsci.edu/pub/barrett/PyFITS_0.3.tgz

I use NumPy to access fixed-type arrays and the record type for
accessing mixed-type arrays.  A common example is accessing the second
element of a mixed-type (ie. an object) from the entire array.  This
returns a record type with a single element, which is equivalent to a
NumPy array of fixed type.  Therefore users expect this object to be a 
NumPy array and it isn't. They have to convert it to one.

 > > two C-extension types merged.  I think this enhancement can be done
 > > with minimal change to the current NumPy behavior and minor changes to
 > > the typecode system.
 > 
 > If you already see how to do it, then great.

Note that NumPy already has some support for an Object type.  It has
been proposed that it be removed, because it is not well supported and
hence few people use it.  I have the contrary opinion and feel we
should enhance the Object type and make it much more usable.  If you
don't need it, then you don't have to use it.  This enhancement really
shouldn't get in the way of those who only use fixed-type arrays.

So what changes to NumPy are needed?

1) Instead of a typecode (or in addition to the typecode for backward
   compatibility), I suggest an optional format keyword, which can be
   used to specify the mixed-type or object format.  Namely, format =
   'i, f, s10', where 'i' is an integer type, 'f' a floating point
   type, and s10 is a string of 10 characters.

2) Array access will be the same as it is now.  For example

   #  Create a 10x10 mixed-type array.
   A = array((10, 10), format = 'i, f, 10s')
   #  Create a 10x10 fixed-type array.
   B = array((10, 10), typecode = 'i')

   #  Print a 5x5 subarray of mixed-type.
   print A[:5,:5]

   #  Print a 5x5 subarray of fixed-type
   print B[:5,:5]
   #  Or 
   #  (Note that the 3rd index is optional for fixed-type arrays, it
   #  always defaults to 0.)
   print B[:5,:5,0]

   #  Print the second element of the mixed-type of the entire array.
   #  Note that this is now an array of fixed-type.
   print A[:,:,1]

   The major thorn that I see at this point is how to reconcile the
   behavior of numbers and strings during operations.  But I don't see 
   this as an intractable problem.

   I actually believe this enhancement will encourage us to create a
   better and more generic multi-dimensional array module by
   concentrating on the behavioral aspects of this extension type.

   Note that J, which NumPy is base upon, allows such mixed-types.

-- 
Dr. Paul Barrett       Space Telescope Science Institute
Phone: 410-516-6714    DESD/DPT
FAX:   410-516-8615    Baltimore, MD 21218