[Numpy-discussion] searchsorted() and memory cache
Stéfan van der Walt
stefan at sun.ac.za
Sat May 10 11:31:39 EDT 2008
Hi Andrew
2008/5/9 Andrew Straw <strawman at astraw.com>:
> I've got a big element array (25 million int64s) that searchsorted()
> takes a long time to grind through. After a bit of digging in the
> literature and the numpy source code, I believe that searchsorted() is
> implementing a classic binary search, which is pretty bad in terms of
> cache misses. There are several modern implementations of binary search
> which arrange items in memory such that cache misses are much more rare.
> Clearly making such an indexing arrangement would take time, but in my
> particular case, I can spare the time to create an index if searching
> was faster, since I'd make the index once but do the searching many times.
>
> Is there an implementation of such an algorithm that works easilty with
> numpy? Also, can you offer any advice, suggestions, and comments to me
> if I attempted to implement such an algorithm?
If found Francesc Altet's Pyrex implementation at
http://mail.python.org/pipermail/python-list/2007-November/466503.html
I modified it for use with Cython and added some tests:
https://code.launchpad.net/~stefanv/+junk/my_bisect
That may be a good starting point for further experimentation. As it
is, it is already about 10 times faster than the built-in version
(since I can assume we're working with int64's, so no special type
checking is done).
Regards
Stéfan
More information about the NumPy-Discussion
mailing list