[AstroPy] SQL-like queries against astropy.table objects

Erik Bray embray at stsci.edu
Mon Feb 10 11:09:23 EST 2014


On 02/08/2014 12:28 PM, Derek Homeier wrote:
> Hi,
>
> I know almost nothing about SQL, but I agree that for basic conditionals that can be used
> on numpy arrays this really is straightforward.
>
> On 08.02.2014, at 3:00PM, Eric Jensen <ejensen1 at swarthmore.edu> wrote:
>
>> In [35]:
>>
>> # Select only entries with long baselines in U and W, and Real amplitudes > 1 Jy:
>> good_rows = np.logical_and(t['U'] > 100, t['V'] > 100, t['Real'] > 1)
>> In [42]:
>>
>> # Make a new table from this selection, and see how many rows were in
>> # new table vs. the original one:
>> new_table = t[good_rows]
>> new_table['U'].shape
>>
...

In fairness, for a very *large* table this sort of thing could use up a lot of 
memory, especially considering that Numpy doesn't use bit arrays for boolean 
masks :/

Of course the OP said it was a small table so that's beside the point here; but 
an actual database should provide an efficient row-based iterator with such a 
selection criteria, even hypothetically over millions of rows.  Of course if 
that's what one needs one should just loader their data into a database :)

Erik B.

>
>> For example, say I have a table with the following columns (not the use
>> case, but an easy demo):
>> JD, mag, mag_error
>>
>> I could find all data points with an SNR > 100 within a date range by:
>>      SELECT * from table WHERE (JD > value AND JD < value AND
>> mag/mag_err > 100);
>
> new_table = t[[('JD']>minval)&(t['JD']<maxval)&(t['mag']/t['mag_err']>100)]
> print(len(t), len(new_table))
>
> directly gives you the desired selection (which works the same with any numpy structured array.
> Not very painful, even if it might get a bit lengthy to type. I don't think though apt.Table offers a more
> comfortable way like e.g. pytables does with tbl.where('JD>minval & JD<maxval')…
>
> I am sure there's more powerful possibilities in SQL that are not as easy to reproduce;
> operating on string columns is a bit more inconvenient, since numpy does not offer a lot of operations
> working on arrays of str type; in this case I think  you have to resort to python loops, but in this case
> astropy Tables add_column[s] helps a lot over plain numpy. I have not explored the grouping methods
> either.
>
> HTH,
>   						Derek
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>




More information about the AstroPy mailing list