Data structure recommendation?

Steven Clark steven.p.clark at gmail.com
Mon Apr 7 17:55:07 EDT 2008


> > I believe the best way to implement this would be a binary search
> > (bisect?) on the actual times, which would be O(log N). Though since
> > they are timestamps they should be monotonically increasing, in which
> > case at least you don't have to go to the expense of sorting them.
> >
> > "Some kind of hash function" won't hack it, since the purpose of a hash
> > function is to map a large number of (possibly) evenly-distributed
> > (potential) keys as nearly as possible randomly across a much smaller
> > set of actual values.
> >
> > You might try messing around with reducing the precision of the numbers
> > to home in on a gross region, but I am not convinced that does anything
> > other than re-spell binary search if carried to extremes.
> >

Thanks all for your comments, which basically confirmed for me that
there is no magic bullet in this situation.
To add an additional wrinkle to the problem, I know a priori that for
sequential calls to foo.get(x), 95% of the time, x is likely to be
very close to the previous x.
In other words, you might see foo.get(3.9), foo.get(3.8),
foo.get(3.7), etc. (think VCR controls, rewinding).
It seems to me because of this, saving state and doing a linear search
from the previous location can actually be faster than a binary
search. But you would pay a big penalty when x jumps by a lot. Hrm, a
balancing act...

Regarding monotonically increasing timestamps: I had initially assumed
so, but I may want to allow adding events back in time. What is the
most efficient way to keep the list sorted after each put()?

Thanks again.



More information about the Python-list mailing list