Loopless syntax for 2d in NumPy (or Numarray)

Mon Oct 20 00:44:31 EDT 2003

"Terry Reedy" <tjreedy at udel.edu> wrote in message news:<VfqdnZ2_Z8mclg6iRVn-vA at comcast.com>...
> "2mc" <mcrider at bigfoot.com> wrote in message
> news:500a4565.0310191108.55e52b1d at posting.google.com...
> > Assume a multidimensional array (2d).  This would be like a
> > spreadsheet of rows and columns.  Further, assume hundreds of 'rows'
> > and 3 columns. Suppose I want a running list of the highest value in
>  a
> > single column for 20 'rows'.  So, starting at 'row' 19, the answer
> > would be the highest value from 'row' 0 to 'row' 19.  Then, at 'row'
> > 20, the answer would be the highest value from 'row' 1 to 'row' 20.
> > And, so on.  Further, suppose I want this value for each 'column'.
> > The result would be a 3 'column' array with 19 less rows than the
> > source array and would contain a running list of highest values of
> > each column for the last 20 rows.
> >
> > How would this be done without loops?  Or, at least without looping
> > through every row.
> 
> Just curious: is this a real problem, or one you made up to stump
> NumPy?  

Yes, this is a real problem.  Of course, it involves much more than
this.  But, yes, I would like to get a running list of highest values
within a range of values.

> Keep in mind that NumPy was written to do typical array operations, linear 
> algebra/analysis, ffts, and even some things not so typical.  A moving 
> maximum is highly nonlinear, unusual, and likely to need explicit looping to 
> be done efficiently.
> 
> I think the way to avoid redoing the max from scratch with each move
> of the window is to use a heap augmented by a circular index that
> enables easy replacement of the departing number with the incoming
> number.  After the replacement, re-establish the heap property and
> record the max.  (For more on heaps, see heapq.py in the lib or
> various algorithm books.)

I didn't know there was a function called heapq.py.  I have 3 books:
Learning Python, Python Essential Reference, and Practical Python.  I
also have read the Numerical Python manual on the numerical Python
website.  None of these mentioned it.

I have a program that was written in a programming language called SPL
(Smart Programming Language).  It is a Pascal-like scripting language
that worked with an older suite of programs called "Smartware 2000." 
It is like Visual Basic for Applications except that it isn't OO.  In
this language I use multidimensional arrays of quite large sizes and
explicitly declare loops to accomplish a lot of what I'm doing.  I
keep the arrays entirely in RAM for speed purposes.  But, because it
was a program designed to work with a suite of applications (though it
does not have to use any of them - just like Visual Basic) it has a
lot of overhead that slows the program down.

So, I want to port this program to something else that will speed the
process up.  In investigating Python one of the comments I read about
it was that it was slower than other languages but easier to program -
that is, it was truly a rapid application developer.  Because it was
slow, an extension module was created to provide true multidimensional
arrays with a computational speed only slightly slower than C++.  So,
I thought to myself that I would sacrifice some speed through the
normal parts of the program for the gain of computaional speed using
this multidimensional array extension in those parts needing it and
also for the ease of application design.  My concern is that I don't
inadvertently write the program in such a way as to slow it down
rather than taking advantage of these modules that provide assistance
with regard to speed.  The language is just enough different that I'm
having trouble thinking in NumPy.

Your suggestion to use heapq was interesting.  Basically, in my
current program I use something similar.  I sort, enter new data into
oldest data's spot, resort, find the max, and repeat.

I have a couple of questions about heapq.  The website says that it is
an array but that it works on lists.  So, heapq would not work on
NumPy arrays, right?  And, it is then a normal Python array rather
than a NumPy array, right? Is there a comparable speed increase using
heapq over explicit Python loops as there is in NumPy over explicit
Python loops?

I guess I'm looking for the fastest way in Python or in any Python
module to accomplish what I want to do.

Thanks for your response.  I appreciate it.

Matt