[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Mike Coleman tutufan at gmail.com
Sun Dec 21 02:09:00 CET 2008


On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
> Could you give us more information about the dictionary. For example,
> how many objects does it contain? Is 45GB the actual size of the
> dictionary or of the Python process?

The 45G was the VM size of the process (resident size was similar).

The dict keys were all uppercase alpha strings of length 7.  I don't
have access at the moment, but maybe something like 10-100M of them
(not sure how redundant the set is).  The values are all lists of
pairs, where each pair is a (string, int).  The pair strings are of
length around 30, and drawn from a "small" fixed set of around 60K
strings ().  As mentioned previously, I think the ints are drawn
pretty uniformly from something like range(10000).  The length of the
lists depends on the redundancy of the key set, but I think there are
around 100-200M pairs total, for the entire dict.

(If you're curious about the application domain, see 'http://greylag.org'.)

> Have you seen any significant difference in the exit time when the
> cyclic GC is disabled or enabled?

Unfortunately, with GC enabled, the application is too slow to be
useful, because of the greatly increased time for dict creation.  I
suppose it's theoretically possible that with this increased time, the
long time for exit will look less bad by comparison, but I'd be
surprised if it makes any difference at all.  I'm confident that there
are no loops in this dict, and nothing for cyclic gc to collect.

Mike


More information about the Python-Dev mailing list