[Tutor] Performance difference, ``in'' vs ``has_key()''

Mon Jul 18 06:55:09 CEST 2005

> A related question is where's the trade-off between using ``in'' with a
> list, and a dictionary?  I presume that using it with small hashes will
> be faster than dictionries since it doesn't have to calculate the
> hashes.

Hi Bill,

Scanning for an elements in a list is a "linear" operation, in the sense
that the time it takes to search is proportional to the size of the list.
(Big list == big search time.)

Dictionaries have a lookup time that will probably be a constant-time
operation, regardless of the size of the collection.
(Big dictionary == probably doesn't matter.  *grin*)

This doesn't mean that dictionaries are always faster than lists: as you
know, calculating hash values can take time.  But the cost of hashing is
usually negligible, since many of Python primitive data types (like
strings) automatically cache their hash values too!

There's always a possibility, however remote, that the hash function is
hideous on the kind of data that you're using.  However, Python's
Dictionary implementation has been battle-tested and tuned well, so it's
probably not a mistake to use dictionaries if they seem like the
appropriate data structure.

A good book in algorithms will almost always cover the performance
characteristics of those two strategies; there are also a lot of good
resources on the web about them.  NIST has two articles on those two:

    http://www.nist.gov/dads/HTML/linearSearch.html
    http://www.nist.gov/dads/HTML/hashtab.html

and the excellent Wikipedia has nice readable articles:

    http://en.wikipedia.org/wiki/Linear_search
    http://en.wikipedia.org/wiki/Hashtable
    http://en.wikipedia.org/wiki/Category:Search_algorithms

Good luck to you!