Why searching in a set is much faster than in a list ?

ast nomail at com.invalid
Wed Sep 28 06:47:56 EDT 2016


"Steven D'Aprano" <steve+comp.lang.python at pearwood.info> a écrit dans le message de 
news:57eb7dff$0$1598$c3e8da3$5496439d at news.astraweb.com...
> On Wednesday 28 September 2016 15:51, ast wrote:
>
>> Hello
>>
>> I noticed that searching in a set is faster than searching in a list.
> [...]
>> I tried a search in a tuple, it's not different that in a list.
>> Any comments ?
>
>
> A list, array or tuple is basically a linear array that needs to be searched
> one item at a time:
>
> [ a | b | c | d | ... | x | y | z ]
>
> To find x, Python has to start at the beginning and look at every item in turn,
> until it finds x.
>
> But a set is basically a hash table. It is an array, but laid out differently,
> with blank cells:
>
> [ # | # | h | p | a | # | m | y | b | # | # | f | x | ... | # ]
>
> Notice that the items are jumbled up in arbitrary order. So how does Python
> find them?
>
> Python calls hash() on the value, which returns a number, and that points
> directly to the cell which would contain the value if it were there. So if you
> search for x, Python calls hash(x) which will return (say) 12. Python then
> looks in cell 12, and if x is there, it returns True, and if it's not, it
> returns False. So instead of looking at 24 cells in the array to find x, it
> calculates a hash (which is usually fast), then looks at 1 cell.
>
> (This is a little bit of a simplification -- the reality is a bit more
> complicated, but you can look up "hash tables" on the web or in computer
> science books. They are a standard data structure, so there is plenty of
> information available.)
>
> On average, if you have a list with 1000 items, you need to look at 500 items
> before finding the one you are looking for. With a set or dict with 1000 items,
> on average you need to look at 1 item before finding the one you are looking
> for. And that is why sets and dicts are usually faster than lists.
>
>
>
> -- 
> Steven
> git gets easier once you get the basic idea that branches are homeomorphic
> endofunctors mapping submanifolds of a Hilbert space.
>

Thanks a lot, very interesting. 




More information about the Python-list mailing list