item access time: sets v. lists

Wed Oct 4 15:03:24 EDT 2006

David Isaac wrote:
> Paul M. wrote:
> > Random access to item in list/set when item exists
> > set  -> 0.000241650824337
> > list -> 0.0245168031132
> >
> > Random access to item in list/set when item does not exist
> > set  -> 0.000187733357172
> > list -> 0.522086186932
>
>
> OK, that's a much better set of answers
> including to questions I did not
> even know I wanted to ask until
> I saw your post.  But, how to
> explain the above??

The code was flawed, but it illustrates a point.  To find an arbitrary
item in a list takes time proportional to the length of the list--on
average, you have to scan n/2 items in a list of length n to find it.
"x in list" really has to check item 0, item 1, item 2, etc until it
finds x or reaches the end of the list.

Obviously in a sorted list like this, finding 0 will be much faster
than finding 9999 since you have to scan almost the whole list to find
the latter.  Of course, if you know the list is sorted you could use a
binary search (which will find the item in log(n) comparisons on
average).

So really to find a random item that is in the list is going to take
more like 0.250 seconds on average, give or take a little.

When the item doesn't occur in the list, you wind up having to check
every element to see if it's there.  (binary search would work for this
case as well for a sorted list).

A set, on the other hand, uses a hash table so finding an element takes
constant time (it's one hash lookup, independent of the size of the
set)--and determining an item isn't there is likewise constant time.

(if your data is degenerate and you have many things in the set that
all hash to the same value then the lookup could take longer, but
that's pretty unlikely to make a big difference in practice with python
sets and user data).