Initializing the number of slots in a dictionary

Mon Aug 7 15:22:38 EDT 2006

Jon Smirl wrote:
> On Sun, 06 Aug 2006 15:33:30 -0700, John Machin wrote:
[snip..]
> >
> > Do you have an application with a performance problem? If so, what makes
> > you think inserting 1M items into a Python dict is contributing to the
> > problem?
>
> I know in advance how many items will be added to the dictionary. Most
> dictionary implementations I have previously worked with are more
> efficient if they know ahead of time how big to make their tables.
>
> In this case I only need to do a presence test of the key, there is no
> actual data associated with the key. The table is used for detecting
> duplicate entries. Is there a more efficient to do this test that sticking
> an empty string into a dict? The keys are sha1.digest().

Possibly a naive question - but would using sets be more efficient ?

They are generally used for the sort of job you are describing (testing
for duplicates rather than storing data associated with a key).

We did some limited performance tests at Resolver Systems and found use
of sets and dictionaries to be almost exactly hte same speed - but
there could be memory advantages for you. Performance is also likely to
be different for vast data sets, but I understand the hashing algorithm
to be very similar for both...

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml