item access time: sets v. lists

Duncan Booth duncan.booth at invalid.invalid
Wed Oct 4 16:14:33 EDT 2006


"sjdevnull at yahoo.com" <sjdevnull at yahoo.com> wrote:

> A set, on the other hand, uses a hash table so finding an element
> takes constant time (it's one hash lookup, independent of the size of
> the set)--and determining an item isn't there is likewise constant
> time. 
> 
One hash calculation, but there could be collisions. Worst case for an n
element dict/set could be that it takes n attempts to find a value, 
although unless you try to do it deliberately by ensuring that the keys 
have identical hash values this won't happen in practice.

Paul McGuire wrote:
> Thanks, I stand corrected.  How do they know how big a hash table to
> use?  I think this is an interesting problem.

If I read the code correctly:

Minimum size is 8. Whenever more than 2/3 of the slots are in use 
(including slots where the element has been deleted) the dictionary is 
resized to the smallest power of 2 (greater than 8) which is greater than 4 
times the number of currently occupied slots (or 2 times the number of 
occupied slots when more than 50000 slots are occupied). This can reduce 
the size if a lot of elements have been deleted.



More information about the Python-list mailing list