extremely slow array indexing?

Grace Fang fang.fang2003 at gmail.com
Thu Nov 30 12:17:01 EST 2006


Hi,

I am writing code to sort the columns according to the sum of each
column. The dataset is huge (50k rows x 300k cols), so i need to read
line by line and do the summation to avoid the out-of-memory problem.
But I don't know why it runs very slow, and part of the code is as
follows. I suspect it's because of array index, but not sure. Can
anyone
point out what needs to be modified to make it run fast? thanks in
advance!

...
from numpy import *
...

       currSum = zeros(self.componentcount)
       currRow = zeros(self.componentcount)
       for featureDict in self.featureDictList:
           currRow[:] = 0
           for components in self.componentdict1:
               if featureDict.has_key(components):
                   col = self.componentdict1[components]
                   value = featureDict[components]
                   currRow[col]=value;
           currSum = currSum + row;
...




More information about the Python-list mailing list