[SciPy-User] Filtering record arrays by contents of columns using `ismember`-like syntax

Skipper Seabold jsseabold at gmail.com
Tue May 24 16:47:40 EDT 2011


On Tue, May 24, 2011 at 4:39 PM, Chris Rodgers
<chris.rodgers at berkeley.edu> wrote:
> Thanks to everyone for their comments!
>
> Concerning the speed of numpy.in1d: my guess is that in1d works best
> for arrays of comparable size, and this method works best for the
> special case when one array contains just a few values. I suppose it
> might make sense for me to break this into two objects. The first
> would replicate in1d for this use case. The second would supply the
> syntactic simplification for filtering.
>

Might it make sense to just patch in1d to handle this case? I'm not so
sure though.

> Concerning the use of PyTables: I definitely agree that is the answer
> for complex queries. I see this object as solving a narrow slice of
> problems between the complex (PyTables) and the trivially simple
> (explicit mask). For whatever reason a lot of my actual day-to-day
> problems fall into that category. Probably because I'm porting this
> code from Matlab and that's just the Matlab way of thinking about
> things.
>
>
>> Just create an appropriate ticket with code, docstring, examples and
>> test cases. :-)
>> At least then it would not get lost in the email archives.
>
> I'm happy to do that, though having never done this, I'm not sure
> where is "appropriate" (scipy trac, numpy trac, Cookbook, etc...)

Yeah, I was thinking about doing this myself. You might want to create
a fork of numpy, implement a function or method, and then request a
review. This is sure to need plenty of testing. If it's a function,
where should it reside? I was thinking of numpy.lib.recfunctions, but
it's not strictly for structured/record arrays. Any other ideas?

You might find this helpful for getting started:
http://docs.scipy.org/doc/numpy/dev/gitwash/index.html

Skipper



More information about the SciPy-User mailing list