Large Dictionaries

Tue May 30 02:13:54 EDT 2006

Roy Smith wrote:
> My guess would be that each resize grows the table by a constant
> factor, which IIRC, works out to amortized O(n). 

Right. For <50000 entries, the size is multiplied by 4 each time,
and doubled each time for large dictionaries.

> It's not as good as
> creating the dict the right size in the first place, but it's really
> not that bad.  Somewhat more troubling is that it can lead to memory
> fragmentation problems.

Depends on your operating system, of course. You get over a page size
fairly quickly, and then many systems use anonymous mappings, where
a range of pages gets mapped from swap on allocation, and unmapped
on deallocation. So while this can cause address space fragmentation,
it doesn't cause memory fragmentation (i.e. memory that is allocated
by the process but can't be used).

> I don't understand why Python doesn't have a way to give a size hint
> when creating a dict.  It seems so obvious to be able to say "d = dict
> (size=10*1000*1000)" if you know beforehand that's how many keys
> you're going to add.

There is no need for it. If dicts don't work well in some cases, these
cases should be investigated and fixed, instead of working around the
problem by loading the problem onto the developer.

As you said, it *should* have linear time, with the number of
insertions. If it doesn't (and it appears not to), that may be due to
hash collisions, or due to the time for computing hashes not being
constant.

Regards,
Martin