Tremendous slowdown due to garbage collection

Carl Banks pavlovevidence at gmail.com
Sat Apr 12 10:20:13 EDT 2008


On Apr 12, 7:02 am, andreas.eis... at gmail.com wrote:
> I would suggest to configure the default behaviour of the garbage
> collector in such a way that this squared complexity is avoided
> without requiring specific knowledge and intervention by the user. Not
> being an expert in these details I would like to ask the gurus how
> this could be done.

Well, the garbage collector activates whenever allocations exceed
deallocations by a certain amount, which for obvious reasons is the
reason for the bottleneck wh

My first stab at a suggestion would be to adjust the threshold
depending on how successful it is.  So if a garbage collection run
collects no objects, double (or whatever) the threshold until the next
run.

More formulaicly, if your threshold is X, and a garbage collection
pass collects Y objects, the the threshold for the next pass would be
something like 2*X/(Y/X+1).  Then you can create your very large data
structure in only O(N log N) time.


But as Steve Holden said, it'll probably mess someone else up.
There's not much you can do with a garbage collector to make it please
everyone.


A question often asked--and I am not a big a fan of these sorts of
questions, but it is worth thinking about--of people who are creating
very large data structures in Python is "Why are you doing that?"
That is, you should consider whether some kind of database solution
would be better.  You mention lots of dicts--it sounds like some
balanced B-trees with disk loading on demand could be a good choice.


Carl Banks



More information about the Python-list mailing list