Is this a bug? Python intermittently stops dead for seconds

Steve Holden steve at holdenweb.com
Sun Oct 1 09:38:30 EDT 2006


charlie strauss wrote:
> Below is a simple program that will cause python to intermittently  
> stop executing for a few seconds.  it's 100% reproducible on my machine.
> 
> I'd be tempted to say this is a nasty garbage collection performance  
> issue except that there is no major memory to be garbage collected in  
> this script.  I'd be tempted to say it was a unix virtual memory  
> issue except this is occuring at around 1/5th of my physical memory  
> size.  So something else unexplained is going on
> 
> Class Foo instances create and hold a list  of size nfoo of integers.  
> (nfoo =10)
> 
> Class Bar instances create and hold a list of size nbar of Foo  
> objects. (nbar =100)
> 
> When the code runs it starts creating and storing Bar objects in a  
> list while watching for real-time glitches in how long it takes to  
> create the next Foo object.  If this is longer than 1/2 of a second  
> then it reports it.
> 
> On my computer  after creating 1500 Bar objects,  the rate of  
> creation of new Foo  suddenly has a  periodic glitch.  This glitch re- 
> occurs about every 300 Bar Objects, and the duration of the glitch  
> keeps getting longer--growing to many seconds!!!!
> 
> Platform: 800Mhz powermac g 4  1Gb of memory
> python:  python 2.4.2
> 
> Note: since I an using absolute time threshold for reporting the  
> glitches, the first one may take more iterations before it occurs on  
> a fast computer.  You may need to increase nbar or nfoo.
> 
> import sys
> from time import time
> 
> 
> class Foo(object):
>      def __init__(me,nfoo):
>          me.memory = [1]*nfoo
> 
> class Bar(object):
>      def __init__(me,nbar,nfoo):
>          tgx.set_tag("Bar")  # timer
> 
>          b = [None]*nbar
>          for i in xrange(nbar):
>              b[i]=Foo(nfoo)  # make a foo, add it to list
>              tgx.jump("Bar"+`i`)  #timer
> 
>          me.b = b  #store the list in my instance memory
> 
> 
> 
> 
> # just a utility class to time things.
> class gtime:
>      def __init__(me,f=sys.stderr):
>          me.last = time()
>          me.f=f
>          me.tag = ""
>          me.ticks = 0
> 
>      def set_tag(me,tag):
>          me.tag = tag
>          me.ticks = 0
> 
>      def reset(me):
>          me.ticks = 0
> 
>      def jump(me,tag="NONE",allowed_gap=0.5):
>          me.t = time()
>          me.ticks +=1
>          if me.t-me.last>allowed_gap:
>              print >>me.f,"Big Gap:",me.t-me.last,"seconds    
> ",me.tag,tag,me.ticks
>          me.last = time()
> 
> tgx = gtime() # instance of the timer
> 
> 
> # main loop
> nbar = 100
> nfoo = 10
> 
> ad_nauseum = 20000
> final = [None]*ad_nauseum
> 
> for i in xrange(ad_nauseum ):
>      if i%100 == 0 :print >>sys.stderr,"Bars Made: ",i
>      final[i] = Bar(nbar,nfoo)
> 
I'm not the Timbot, but the behaviour of the program seems reasonably 
comprehensible to me. You create a list of 20,000 elements and proceed 
to populate it with Bar objects, each of which contains a 
thousand-element list pointing to a ten-element list.

So despite the fact that all list elements are (references to) the 
integer 1, there are a lot of references, which take up a lot of memory. 
Each time you report on the number of Bars you've created 100,000 
references to 1.

So it's hardly surprising that at some point the garbage collector cuts 
in and starts looking for free memory, which takes a certain amount of time.

It's also unsurprising that the garbage collector takes longer the more 
stuff you have in memory, because it has to sweep all those structures 
to make sure that there aren't any collectible cycles.

As I run the program I see virtual memory usage climbing steadily - 
hardly surprising given the nature of the program. At some point the 
working set of the program is going to exceed the amount of physical 
memory that the operating system can allocate it.

At that point paging will start to occur, and garbage collection (which 
has to access more or less all of memory) is now triggering disk 
activity, making it slower by a significant margin. This didn't happen 
on my machine with your initial settings as I have plenty of memory, but 
when I increased the foo size to 100, by the time I'd allocated 13,000 
bars I was seeing big big gaps and little big gaps, the big ones 
apparently being required by the need to page something else out to 
allocate additional space for the task:

Bars Made:  12900
Big Gap: 3.78899979591 seconds Bar Bar95 96
Bars Made:  13000
Big Gap: 0.500999927521 seconds Bar Bar89 90
Bars Made:  13100
Bars Made:  13200
Big Gap: 3.72300004959 seconds Bar Bar65 66
Bars Made:  13300
Big Gap: 0.500999927521 seconds Bar Bar72 73
Bars Made:  13400
Bars Made:  13500
Big Gap: 3.83699989319 seconds Bar Bar35 36
Bars Made:  13600
Bars Made:  13700
Big Gap: 0.500999927521 seconds Bar Bar65 66
Bars Made:  13800
Big Gap: 3.90399980545 seconds Bar Bar4 5
Bars Made:  13900
Bars Made:  14000
Bars Made:  14100
Big Gap: 3.95200014114 seconds Bar Bar75 76
Bars Made:  14200
Big Gap: 0.500999927521 seconds Bar Bar38 39
Bars Made:  14300

I trust readers will excuse me for attaching a small PNG graphic that 
clearly shows this behaviour: CPU utilisation climbs to around 100%, but 
then starts dropping intermittently - this *has* to be paging activity. 
When CPU activity dives just after 300 seconds that's when I hit CTRL/C. 
The subsequent 800 seconds was spent unwinding the memory so it could be 
deallocated!

Bars Made:  14400
Big Gap: 0.799000024796 seconds Bar Bar0 1
Traceback (most recent call last):
   File "test40.py", line 57, in <module>
     final[i] = Bar(nbar,nfoo)
   File "test40.py", line 15, in __init__
     b[i]=Foo(nfoo)  # make a foo, add it to list
   File "test40.py", line 6, in __init__
     def __init__(me,nfoo):
KeyboardInterrupt

[LOOOOOOOOONG PAUSE ...}
sholden at bigboy ~/Projects/Python
$

I might add that it took a long time for Windows performance to recover, 
since any task that I subsequently touched had to be brought back in 
from suspension in the paging file.

This is one of those cases where it would be faster to avoid using 
garbage collection at all, since no cyclic structures appear to be 
created. But there's an indication of just how much memory you are using 
in the length of time the program takes between reporting a 
KeyboardInterrupt and returning to the command line prompt: this is 
precisely the length of time it take to deallocate all those lists and 
return them to the free memory pool.

So basically what you have here is a pathological example of why it's 
sometimes wise to disable garbage collection. Tim, did I miss anything?

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test40.png
Type: image/png
Size: 8451 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20061001/8fc54cf4/attachment.png>


More information about the Python-list mailing list