Why searching in a set is much faster than in a list ?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Sep 28 04:23:23 EDT 2016


On Wednesday 28 September 2016 15:51, ast wrote:

> Hello
> 
> I noticed that searching in a set is faster than searching in a list.
[...]
> I tried a search in a tuple, it's not different that in a list.
> Any comments ?


A list, array or tuple is basically a linear array that needs to be searched 
one item at a time:

[ a | b | c | d | ... | x | y | z ]

To find x, Python has to start at the beginning and look at every item in turn, 
until it finds x.

But a set is basically a hash table. It is an array, but laid out differently, 
with blank cells:

[ # | # | h | p | a | # | m | y | b | # | # | f | x | ... | # ]

Notice that the items are jumbled up in arbitrary order. So how does Python 
find them?

Python calls hash() on the value, which returns a number, and that points 
directly to the cell which would contain the value if it were there. So if you 
search for x, Python calls hash(x) which will return (say) 12. Python then 
looks in cell 12, and if x is there, it returns True, and if it's not, it 
returns False. So instead of looking at 24 cells in the array to find x, it 
calculates a hash (which is usually fast), then looks at 1 cell.

(This is a little bit of a simplification -- the reality is a bit more 
complicated, but you can look up "hash tables" on the web or in computer 
science books. They are a standard data structure, so there is plenty of 
information available.)

On average, if you have a list with 1000 items, you need to look at 500 items 
before finding the one you are looking for. With a set or dict with 1000 items, 
on average you need to look at 1 item before finding the one you are looking 
for. And that is why sets and dicts are usually faster than lists.



-- 
Steven
git gets easier once you get the basic idea that branches are homeomorphic 
endofunctors mapping submanifolds of a Hilbert space.




More information about the Python-list mailing list