A faster way of finding historical highs/lows

Peter Hansen peter at engcorp.com
Mon Jun 14 09:05:29 EDT 2004


Eamonn Sullivan wrote:

> Peter Hansen <peter at engcorp.com> wrote in message news:<SN2dnT9ob92aPFTdRVn-gQ at powergate.ca>...
>>The fastest algorithm might depend on how you use the data, as well.
> 
> Thanks for this. At the moment, the software answers a few questions
> (highest/lowest, longest streak, etc.) once per retrieval of data. The
> database retrieval, though, is *by far* the biggest time sapper, so a
> little preprocessing would be almost unnoticeable in comparison,
> probably (depends on how much, of course).
> 
> So, I'm guessing I could sort on the particular price field I'm using
> (decorate-sort-undecorate), and then just find the most recent date
> among the subset of data that meets the criteria (higher or lower). Is
> that what you mean? 

That's exactly the type of thing I meant, and even the specific
improvement I was considering.

> By binary search, do you mean further reorganizing
> the data into a binary tree using date?

The binary search is just a lot faster way of homing in on a region
of the data compared to most other approaches.  If it's sorted by
price, then binary search to get to the chunk with prices equal
to what you want.  The last one (assuming ascending dates) is the
"most recent with an equal or lower price", or if there is none
equal, pick the one to the left of the final binary search point...
that kind of thing.

For a binary search, you just examine the data element halfway between
two end points each time, comparing it with the search target.
Decide if you need to continue search to the left or the right of
the current position, then pick a point halfway into _that_ region
and do it again.  Eventually you are down to a region with a single
point, and then you're done.  It takes only log2(n) comparisons to
search in n points, and often less...

-Peter



More information about the Python-list mailing list