Is 100,000 entries big for a dictionary?

Tim Peters tim.one at home.com
Mon Jan 1 18:43:45 EST 2001


[Tim]
> On most machines hash values are 32 bits, and are always
> computed as such, and are cached in the dict entry.
> When it's time to grow, the table does double in size,
> but hash values are not recomputed -- it simply uses one
> more of the hash bits that's always been there.  Note
> that entries are reinserted when the table grows --
> that's probably not what you meant by "the usual
> extensible hashing mechanism". ..

[rturpin at my-deja.com]
> Actually, that describes exactly what I meant. This is
> now a pretty standard algorithm used in a variety of
> DBMS and other applications. Only the entries whose
> next bit is 1 need to be reinserted, though that can
> cause other entries to move in the lower half if there
> were previous collisions.

Well, at this point you would have an easier time reading dictobject.c than
I would have guessing exactly what you mean <wink>.  Python's hash table
structure is utterly vanilla, a contiguous vector of 2**N structs -- it's
just that sometimes a brand new vector 2x bigger (or 2**M times smaller) may
get allocated and everything moves to the new vector.  All references to
extensible (more often "extendible") hashing I've seen have a much fancier
scheme in mind, with a directory (or index) table pointing to buckets that
can split independently; the thrust of such schemes is to ensure that an
item can be found with no more than two disk accesses (incl. one for the
directory table, if it's paged out).  I believe that's the std meaning, and
it doesn't have anything in common with Python's scheme beyond hash
functions and powers of 2; e.g.,

    http://hissa.nist.gov/dads/HTML/extendibleHashing.html
    http://www.cs.byu.edu/courses/cs453/lecture18.html
    http://www.csm.astate.edu/~rossa/datastruc/Extend.html
    http://www.cs.purdue.edu/homes/sunil/syllabi/541/Lecture5/sld006.htm
    http://www.istis.unomaha.edu/isqa/haworth/isqa3300/fs009.htm

The last link is especially cool because it contains an implementation in
gloriously readable COBOL <wink>.

SIGN-OFF-USING-LY-SIG-ly y'rs  - tim





More information about the Python-list mailing list