[pypy-dev] Pypy garbage collection

Mon Mar 17 08:21:25 CET 2014

there is an environment variable PYPYLOG=gc:- (where - is stdout)
which will do that for you btw.

maybe you can find out what's that using profiling or valgrind?

On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <mak at issuu.com> wrote:
> I have tried getting the pypy source and building my own version of pypy. I
> have modified rpython/memory/gc/incminimark.py:major_collection_step() to
> print out when it starts and when it stops. Apparently, the slow queries do
> NOT occur during major_collection_step; at least, I have not observed major
> step output during a query execution. So, apparently, something else is
> blocking. This could be another aspect of the GC, but it could also be
> anything else.
>
> Just to be sure, I have tried running the same application in python with
> garbage collection disabled. I don't see the problem there, so it is somehow
> related to either GC or the runtime somehow.
>
> Cheers,
> /Martin
>
>
> On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <mak at issuu.com> wrote:
>>
>> We have hacked up a small sample that seems to exhibit the same issue.
>>
>> We basically generate a linked list of objects. To increase connectedness,
>> elements in the list hold references (dummy_links) to 10 randomly chosen
>> previous elements in the list.
>>
>> We then time a function that traverses 50000 elements from the list from a
>> random start point. If the traversal reaches the end of the list, we instead
>> traverse one of the dummy links. Thus, exactly 50K elements are traversed
>> every time. To generate some garbage, we build a list holding the traversed
>> elements and a dummy list of characters.
>>
>> Timings for the last 100 runs are stored in a circular buffer. If the
>> elapsed time for the last run is more than twice the average time, we print
>> out a line with the elapsed time, the threshold, and the 90% runtime (we
>> would like to see that the mean runtime does not increase with the number of
>> elements in the list, but that the max time does increase (linearly with the
>> number of object, i guess); traversing 50K elements should be independent of
>> the memory size).
>>
>> We have tried monitoring memory consumption by external inspection, but
>> cannot consistently verify that memory is deallocated at the same time that
>> we see slow requests. Perhaps the pypy runtime doesn't always return freed
>> pages back to the OS?
>>
>> Using top, we observe that 10M elements allocates around 17GB after
>> building, 20M elements 26GB, 30M elements 28GB (and grows to 35GB shortly
>> after building).
>>
>> Here is output from a few runs with different number of elements:
>>
>>
>> pypy mem.py 10000000
>> start build
>> end build 84.142424
>> that took a long time elapsed: 13.230586  slow_threshold: 1.495401
>> 90th_quantile_runtime: 0.421558
>> that took a long time elapsed: 13.016531  slow_threshold: 1.488160
>> 90th_quantile_runtime: 0.423441
>> that took a long time elapsed: 13.032537  slow_threshold: 1.474563
>> 90th_quantile_runtime: 0.419817
>>
>> pypy mem.py 20000000
>> start build
>> end build 180.823105
>> that took a long time elapsed: 27.346064  slow_threshold: 2.295146
>> 90th_quantile_runtime: 0.434726
>> that took a long time elapsed: 26.028852  slow_threshold: 2.283927
>> 90th_quantile_runtime: 0.374190
>> that took a long time elapsed: 25.432279  slow_threshold: 2.279631
>> 90th_quantile_runtime: 0.371502
>>
>> pypy mem.py 30000000
>> start build
>> end build 276.217811
>> that took a long time elapsed: 40.993855  slow_threshold: 3.188464
>> 90th_quantile_runtime: 0.459891
>> that took a long time elapsed: 41.693553  slow_threshold: 3.183003
>> 90th_quantile_runtime: 0.393654
>> that took a long time elapsed: 39.679769  slow_threshold: 3.190782
>> 90th_quantile_runtime: 0.393677
>> that took a long time elapsed: 43.573411  slow_threshold: 3.239637
>> 90th_quantile_runtime: 0.393654
>>
>> Code below
>> --------------------------------------------------------------
>> import time
>> from random import randint, choice
>> import sys
>>
>>
>> allElems = {}
>>
>> class Node:
>>     def __init__(self, v_):
>>         self.v = v_
>>         self.next = None
>>         self.dummy_data = [randint(0,100)
>>                            for _ in xrange(randint(50,100))]
>>         allElems[self.v] = self
>>         if self.v > 0:
>>             self.dummy_links = [allElems[randint(0, self.v-1)] for _ in
>> xrange(10)]
>>         else:
>>             self.dummy_links = [self]
>>
>>     def set_next(self, l):
>>         self.next = l
>>
>>
>> def follow(node):
>>     acc = []
>>     count = 0
>>     cur = node
>>     assert node.v is not None
>>     assert cur is not None
>>     while count < 50000:
>>         # return a value; generate some garbage
>>         acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") for x in
>> xrange(100)]))
>>
>>         # if we have reached the end, chose a random link
>>         cur = choice(cur.dummy_links) if cur.next is None else cur.next
>>         count += 1
>>
>>     return acc
>>
>>
>> def build(num_elems):
>>     start = time.time()
>>     print "start build"
>>     root = Node(0)
>>     cur = root
>>     for x in xrange(1, num_elems):
>>         e = Node(x)
>>         cur.next = e
>>         cur = e
>>     print "end build %f" % (time.time() - start)
>>     return root
>>
>>
>> num_timings = 100
>> if __name__ == "__main__":
>>     num_elems = int(sys.argv[1])
>>     build(num_elems)
>>     total = 0
>>     timings = [0.0] * num_timings # run times for the last num_timings
>> runs
>>     i = 0
>>     beginning = time.time()
>>     while time.time() - beginning < 600:
>>         start = time.time()
>>         elem = allElems[randint(0, num_elems - 1)]
>>         assert(elem is not None)
>>
>>         lst = follow(elem)
>>
>>         total += choice(lst)[0] # use the return value for something
>>
>>         end = time.time()
>>
>>         elapsed = end-start
>>         timings[i % num_timings] = elapsed
>>         if (i > num_timings):
>>             slow_time = 2 * sum(timings)/num_timings # slow defined as >
>> 2*avg run time
>>             if (elapsed > slow_time):
>>                 print "that took a long time elapsed: %f  slow_threshold:
>> %f  90th_quantile_runtime: %f" % \
>>                     (elapsed, slow_time,
>> sorted(timings)[int(num_timings*.9)])
>>         i += 1
>>     print total
>>
>>
>>
>>
>>
>> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski <fijall at gmail.com>
>> wrote:
>>>
>>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <mak at issuu.com> wrote:
>>> > Hi Armin, Maciej
>>> >
>>> > Thanks for responding.
>>> >
>>> > I'm in the process of trying to determine what (if any) of the code I'm
>>> > in a
>>> > position to share, and I'll get back to you.
>>> >
>>> > Allowing hinting to the GC would be good. Even better would be a means
>>> > to
>>> > allow me to (transparently) allocate objects in unmanaged memory, but I
>>> > would expect that to be a tall order :)
>>> >
>>> > Thanks,
>>> > /Martin
>>>
>>> Hi Martin.
>>>
>>> Note that in case you want us to do the work of isolating the problem,
>>> we do offer paid support to do that (then we can sign NDAs and stuff).
>>> Otherwise we would be more than happy to fix bugs once you isolate a
>>> part you can share freely :)
>>
>>
>