Comparing lists

Mon Oct 17 19:08:47 EDT 2005

aleaxit at yahoo.com (Alex Martelli) writes:
> implementation of the components one's considering!  Rough ideas of
> *EXPECTED* run-times (big-Theta) for various subcomponents one is
> sketching are *MUCH* more interesting and important than "asymptotic 
> worst-case for amounts of input tending to infinity" (big-O) -- for

I thought big-Theta meant the intersection of big-O (upper bound on
the worst case) and big-Omega (lower bound on the worst case).

> example, where I sketch-in (mentally, on paper, or on whiteboard) a
> "hash table" subcomponent, I consider the *expected* (Theta) performance
> (constant-time lookups), definitely NOT the big-O "linear time" lookups
> which just MIGHT occur (if, say, all inputs just happened to hash to the
> same value)... otherwise, I'd never use hash tables, right?-)

You really have to be careful about choices like that.  See:

   http://www.cs.rice.edu/~scrosby/hash/

which I also cited last night.  

Exercise: suspend disbelief for a moment and imagine that 1) Google
search works by spidering the web and building a giant hash table of
words that it finds in web pages, to use for servicing future queries;
2) The hash function is similiar to the one used in Python dicts and
is either public knowledge or else leaks out of the company somehow;
and 3) (biggest disbelief suspension of them all) I work for
Microsoft.  Question: how could I use knowledge of the hash function
to give Google a hard time?

At least one well known implementer apparently does intend to quit
using hash tables due to considerations like this:

   http://cr.yp.to/critbit.html