[Numpy-discussion] Advice on converting iterator into array efficiently
Andrew Straw
strawman at astraw.com
Thu Aug 28 21:26:18 EDT 2008
Alan Jackson wrote:
> Looking for advice on a good way to handle this problem.
>
> I'm dealing with large tables (Gigabyte large). I would like to
> efficiently subset values from one column based on the values in
> another column, and get arrays out of the operation. For example,
> say I have 2 columns, "energy" and "collection". Collection is
> basically an index that flags values that go together, so all the
> energy values with a collection value of 18 belong together. I'd
> like to be able to set up an iterator on collection that would
> hand me an array of energy on each iteration :
>
> if table is all my data, then something like
>
> for c in table['collection'] :
> e = c['energy']
> ... do array operations on e
>
> I've been playing with pytables, and they help, but I can't quite
> seem to get there. I can get an iterator for energy within a collection,
> but I can't figure out an efficient way to get an array out.
>
> What I have so far is
>
> for h in np.unique(table.col('collection')) :
> rows = table.where('collection == c')
> for row in rows :
> print c,' : ', row['energy']
>
> but I really want to convert rows['energy'] to an array.
>
> I've thought about building a nasty set of pointers and whatnot -
> I did it once in perl - but I'm hoping to avoid that.
>
>
I do stuff like this all the time:
t = table[:] # convert to structured array
collections = np.unique(t['collection'])
for collection in collections:
cond = t['collection'] == collection
energy_this_collection = t['energy'][cond]
HTH,
Andrew
More information about the NumPy-Discussion
mailing list