Tremendous slowdown due to garbage collection

Dieter Maurer dieter at handshake.de
Mon Apr 28 12:44:57 EDT 2008


"Martin v. Löwis" wrote at 2008-4-27 19:33 +0200:
>> Martin said it but nevertheless it might not be true.
>> 
>> We observed similar very bad behaviour -- in a Web application server.
>> Apparently, the standard behaviour is far from optimal when the
>> system contains a large number of objects and occationally, large
>> numbers of objects are created in a short time.
>> We have seen such behaviour during parsing of larger XML documents, for
>> example (in our Web application).
>
>I don't want to claim that the *algorithm* works for all typically
>applications well. I just claim that the *parameters* of it are fine.
>The OP originally proposed to change the parameters, making garbage
>collection run less frequently. This would a) have bad consequences
>in terms of memory consumption on programs that do have allocation
>spikes, and b) have no effect on the asymptotic complexity of the
>algorithm in the case discussed.

In our case, it helped to change the parameters:

  As usual in Python, in our case cyclic garbage is very rare.
  On the other hand, we have large caches with lots of objects,
  i.e. a large number of long living objects.
  Each generation 2 garbage collection visits the complete
  object set. Thus, if it is called too often, matters can
  deteriorate drastically.

  In our case, the main problem has not been the runtime
  but that during GC the GIL is hold (understandably).
  This meant that we had every few minutes a scheduling
  distortion in the order of 10 to 20 s (the time of
  our generation 2 gc).

We changed the parameters to let generation 2 GC happen
at about 1/1000 of its former frequency.


I do not argue that Python's default GC parameters must change -- only
that applications with lots of objects may want to consider a
reconfiguration.



-- 
Dieter



More information about the Python-list mailing list