[Numpy-discussion] Advice on converting iterator into array efficiently

Thu Aug 28 21:26:18 EDT 2008

Alan Jackson wrote:
> Looking for advice on a good way to handle this problem.
>
> I'm dealing with large tables (Gigabyte large). I would like to 
> efficiently subset values from one column based on the values in
> another column, and get arrays out of the operation. For example,
> say I have 2 columns, "energy" and "collection". Collection is
> basically an index that flags values that go together, so all the
> energy values with a collection value of 18 belong together. I'd
> like to be able to set up an iterator on collection that would
> hand me an array of energy on each iteration :
>
> if table is all my data, then something like
>
> for c in table['collection'] :
>     e = c['energy']
>     ... do array operations on e
>
> I've been playing with pytables, and they help, but I can't quite
> seem to get there. I can get an iterator for energy within a collection,
> but I can't figure out an efficient way to get an array out.
>
> What I have so far is 
>
> for h in np.unique(table.col('collection')) :
>     rows = table.where('collection == c')
>     for row in rows :
>         print c,' : ', row['energy']
>
> but I really want to convert rows['energy'] to an array.
>
> I've thought about building a nasty set of pointers and whatnot -
> I did it once in perl - but I'm hoping to avoid that.
>
>   

I do stuff like this all the time:

t = table[:] # convert to structured array
collections = np.unique(t['collection'])
for collection in collections:
    cond = t['collection'] == collection
    energy_this_collection = t['energy'][cond]

HTH,
Andrew