Numpy Performance

Thu Apr 23 12:14:35 EDT 2009

timlash wrote:

> Still fairly new to Python.  I wrote a program that used a class
> called RectangularArray as described here:
> 
> class RectangularArray:
>    def __init__(self, rows, cols, value=0):
>       self.arr = [None]*rows
>       self.row = [value]*cols
>    def __getitem__(self, (i, j)):
>       return (self.arr[i] or self.row)[j]
>    def __setitem__(self, (i, j), value):
>       if self.arr[i]==None: self.arr[i] = self.row[:]
>       self.arr[i][j] = value
> 
> This class was found in a 14 year old post:
> http://www.python.org/search/hypermail/python-recent/0106.html
> 
> This worked great and let me process a few hundred thousand data
> points with relative ease.  However, I soon wanted to start sorting
> arbitrary portions of my arrays and to transpose others.  I turned to
> Numpy rather than reinventing the wheel with custom methods within the
> serviceable RectangularArray class.  However, once I refactored with
> Numpy I was surprised to find that the execution time for my program
> doubled!  I expected a purpose built array module to be more efficient
> rather than less.
> 
> I'm not doing any linear algebra with my data.  I'm working with
> rectangular datasets, evaluating individual rows, grouping, sorting
> and summarizing various subsets of rows.
> 
> Is a Numpy implementation overkill for my data handling uses?  Should
> I evaluate prior array modules such as Numeric or Numarray?  Are there
> any other modules suited to handling tabular data?  Would I be best
> off expanding the RectangularArray class for the few data
> transformation methods I need?
> 
> Any guidance or suggestions would be greatly appreciated!

Do you have many rows with zeros? That might be the reason why your
self-made approach shows better performance.

Googling for "numpy sparse" finds:

http://www.scipy.org/SciPy_Tutorial

Maybe one of the sparse matrix implementations in scipy works for you.

Peter