Performance: sets vs dicts.

Wed Sep 1 02:42:15 EDT 2010

Terry Reedy <tjreedy at udel.edu> writes:
> Does anyone seriously think that an implementation should be rejected
> as an implementation if it intellegently did seq[n] lookups in
> log2(n)/31 time units for all n (as humans would do), instead of
> stupidly taking 1 time unit for all n < 2**31 and rejecting all larger
> values (as 32-bit CPython does)?

Er, how can one handle n > 2**31 at all, in 32-bit CPython?

>>  Dicts that were not O(1) for access with non-pathological hashing?
>
> You would actually be unhappy if small dicts ran even faster than they
> do now (while large dicts ran the same)? Not me!

Not sure what you are referring to by that.  I'd actually be ok with
using O(log(n)) dicts to eliminate the O(n) behavior of the hash-based
implementation on adverse input sets.

> Similarly, books say that O(n*logn) sorting is 'better' that O(n**2)
> sorting. However, Tim Peters verified that the opposite is true for
> real times for small n, and hence the current list.sort smartly uses a
> fast O(n**2) algorithm for small lists (len < 64) and small runs in
> larger lists. Rejecting list.sort for this reason would be stupid.

That algorithm is still O(n log n) since the n**2 behavior up to
some constant n is effectively an additive constant that asymptotically
is within the specified big-O.  64 actually sounds suspiciously large
for the cutover, but I'm sure Tim tuned it carefully.