[Numpy-discussion] speeding up an array operation

Thu Jul 9 04:41:14 EDT 2009

On 9-Jul-09, at 1:12 AM, Mag Gam wrote:

> Here is what I have, which does it 1x1:
>
> z={}  #dictionary
> r=csv.reader(file)
> for i,row in enumerate(r):
>  p="/MIT/"+row[1]
>
>  if p not in z:
>    z[p]=0:
>  else:
>    z[p]+=1
>
>  arr[p]['chem'][z[p]]=tuple(row) #this loads the array 1 x 1
>
>
> I would like to avoid the 1x1 loading, instead I would like to bulk
> load the array. Lets say load up 5million lines into memory and then
> push into array. Any ideas on how to do that?

Depending on how big your data is, this looks like a job for e.g.  
numpy.loadtxt(), to give you one big array.

Then sort the array on the second column, so that all the rows with  
the same 'p' appear one after the other. Then you can assign slices of  
this big array to be arr[p]['chem'].

David