Numpy Performance
Peter Otten
__peter__ at web.de
Thu Apr 23 12:14:35 EDT 2009
timlash wrote:
> Still fairly new to Python. I wrote a program that used a class
> called RectangularArray as described here:
>
> class RectangularArray:
> def __init__(self, rows, cols, value=0):
> self.arr = [None]*rows
> self.row = [value]*cols
> def __getitem__(self, (i, j)):
> return (self.arr[i] or self.row)[j]
> def __setitem__(self, (i, j), value):
> if self.arr[i]==None: self.arr[i] = self.row[:]
> self.arr[i][j] = value
>
> This class was found in a 14 year old post:
> http://www.python.org/search/hypermail/python-recent/0106.html
>
> This worked great and let me process a few hundred thousand data
> points with relative ease. However, I soon wanted to start sorting
> arbitrary portions of my arrays and to transpose others. I turned to
> Numpy rather than reinventing the wheel with custom methods within the
> serviceable RectangularArray class. However, once I refactored with
> Numpy I was surprised to find that the execution time for my program
> doubled! I expected a purpose built array module to be more efficient
> rather than less.
>
> I'm not doing any linear algebra with my data. I'm working with
> rectangular datasets, evaluating individual rows, grouping, sorting
> and summarizing various subsets of rows.
>
> Is a Numpy implementation overkill for my data handling uses? Should
> I evaluate prior array modules such as Numeric or Numarray? Are there
> any other modules suited to handling tabular data? Would I be best
> off expanding the RectangularArray class for the few data
> transformation methods I need?
>
> Any guidance or suggestions would be greatly appreciated!
Do you have many rows with zeros? That might be the reason why your
self-made approach shows better performance.
Googling for "numpy sparse" finds:
http://www.scipy.org/SciPy_Tutorial
Maybe one of the sparse matrix implementations in scipy works for you.
Peter
More information about the Python-list
mailing list