[Numpy-discussion] searchsorted() and memory cache

Tue May 13 21:08:18 EDT 2008

Charles R Harris wrote:
>
>
> On Tue, May 13, 2008 at 5:59 PM, Andrew Straw <strawman at astraw.com
> <mailto:strawman at astraw.com>> wrote:
>
>     Thanks for all the comments on my original question. I was more
>     offline
>     than intended after I sent it until now, so I'm sorry I wasn't
>     immediately able to participate in the discussion.
>
>     Anyhow, after working on this a bit more, I came up with a few
>     implementations of search algorithms doing just what I needed with the
>     same interface available using bazaar and launchpad at
>     http://launchpad.net/~astraw/+junk/fastsearch
>     <http://launchpad.net/%7Eastraw/+junk/fastsearch> (MIT license). I
>     have
>     attached the output of the plot_comparisons.py benchmarking script to
>     this email (note that this benchmarking is pretty crude).
>
>     For the problem I originally wrote about, I get what a nearly
>     unbelievable speedup of ~250x using the
>     fastsearch.downsamp.DownSampledPreSearcher class, which is very
>     similar
>     in spirit to Charles' suggestion. It takes 1000 values from the
>     original
>     array to create a new first-level array that is itself localized in
>     memory and points to a more localized region of the full original
>     array.
>     Also, I get a similar (though slightly slower) result using AVL trees
>     using the fastsearch.avlsearch.AvlSearcher class, which uses pyavl (
>     http://sourceforge.net/projects/pyavl ).
>
>     Using the benchmarking code included in the bzr branch, I don't get
>     anything like this speedup (e.g. the attached figure), so I'm not sure
>     exactly what's going on at this point, but I'm not going to argue
>     with a
>     250x speedup, so the fastsearch.downsamp code is now being put to
>     use in
>     one of my projects.
>
>     Stefan -- I think your code simply implements the classic binary
>     search
>     -- I don't see how it will reduce cache misses.
>
>     Anyhow, perhaps someone will find the above useful. I guess it would
>     still be a substantial amount of work to make a numpy-types-aware
>     implementation of AVL trees or similar algorithms. These sorts of
>     binary
>     search trees seem like the right way to solve this problem and thus
>     there might be an interesting project in this. I imagine that a
>     numpy-types-aware Cython might make such implementation significantly
>     easier and still blazingly fast compared to the binary search
>     implemented in searchsorted() given today's cached memory
>     architectures.
>
>
> That's pretty amazing, but I don't understand the graph. The 
> DownSampled search looks like the worst. Are the curves mislabled? Are
> the axis correct? I'm assuming smaller is better here.
The lines are labeled properly -- the graph is inconsistent with the
findings on my real data (not shown), which is what I meant with "Using
the benchmarking code included in the bzr branch, I don't get anything
like this speedup (e.g. the attached figure)". My guess is that the
BinarySearcher climbs terribly under some usage pattern that isn't being
exhibited with this test. I'm really not sure yet what is the important
difference with my real data and these synthetic data. I will keep the
list posted as I find out more. Clearly, on the synthetic data for the
benchmark, the BinarySearcher does pretty well when N items is large.
This is quite contrary to my theory about cache misses being the root of
my problem with the binary search, so I don't understand it at the
moment, but certainly both the of the other searchers perform better on
my real data.

I will post any new insights as I continue to work on this...

-Andrew