[pypy-dev] pypy GC on large objects Re: funding/popularity?

Paolo Giarrusso p.giarrusso at gmail.com
Tue Jan 4 15:50:50 CET 2011


On Tue, Jan 4, 2011 at 09:40,  <Ben.Young at sungard.com> wrote:
>
>
>> -----Original Message-----
>> From: pypy-dev-bounces at codespeak.net [mailto:pypy-dev-
>> bounces at codespeak.net] On Behalf Of Paolo Giarrusso
>> Sent: 24 December 2010 11:39
>> To: Dima Tisnek
>> Cc: PyPy Dev; Armin Rigo
>> Subject: Re: [pypy-dev] pypy GC on large objects Re:
>> funding/popularity?
>>
>> On Thu, Dec 23, 2010 at 20:30, Dima Tisnek <dimaqq at gmail.com> wrote:
>> > Basically collecting this is hard:
>> >
>> > dict(a=range(9**9))
>> >
>> > large list is referenced, the object that holds the only reference is
>> > small no matter how you look at it.
>> First, usually (in most GC-ed languages) you can collect the list
>> before the dict. In PyPy, if finalizers are involved (is this the case
>> here? That'd be surprising), this is no more true.
>>
>> However, object size is not the point. For standard algorithms, the
>> size of an object does not matter at all in deciding when it's
>> collected - I already discussed this in my other email in this thread,
>> and I noted what actually could happen in the examples described by
>> Armin, and your examples show that it is a good property. A large
>> object in the same heap can fill it up and trigger an earlier garbage
>> collection.
>>
>> In general, if GC ran in the background (but it usually doesn't, and
>> not in PyPy) it could make sense to free objects sooner or later,
>> depending not on object size, but on "how much memory would be
>> 'indirectly freed' by freeing this object". However, because of
>> sharing, answering this question is too complex (it requires
>> collecting data from the whole heap). Moreover, the whole thing makes
>> no sense at all with usual, stop-the-world collectors: the app is
>> stopped, then the whole young generation, or the whole heap, is
>> collected, then the app is resumed.
>>
>> When separate heaps are involved (such as with ctypes, or with Large
>> Object Spaces, which avoid using a copy collector for large objects),
>> it is more complicated to ensure that the same property holds: you
>> need to consider stats of all heaps to decide whether to trigger GC.
>>
>> > I guess it gets harder still if there are many small live objects, as
>> > getting to this dict takes a while
>> > (easier in this simple case with generataional collector, O(n) in
>> general case)
>>
>> Not sure what you mean; I can make sense of it (not fully) only with
>> an incremental collector, and they are still used seldom (especially,
>> not in PyPy).
>>
>> Best regards
>>
>> > On 23 December 2010 06:38, Armin Rigo <arigo at tunes.org> wrote:
>> >> Hi René,
>> >>
>> >> On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield <renesd at gmail.com>
>> wrote:
>> >>> I think this is a case where the object returned by
>> >>> ctypes.create_string_buffer() could use a correct __sizeof__ method
>> >>> return value.  If pypy supported that, then the GC's could support
>> >>> extensions, and 'opaque' data structures in C too a little more
>> >>> nicely.
>> >>
>> >> I think you are confusing levels.  There is no way the GC can call
>> >> some app-level Python method to get information about the objects it
>> >> frees (and when would it even call it?).  Remember that our GC is
>> >> written at a level where it works for any interpreter for any
>> >> language, not just Python.
>> >>
>
>
> .NET supports calls to GC.AddMemoryPressure and GC.RemoveMemoryPressure to inform the GC you are allocating things outside of its knowledge. Maybe something similar would help?

That's interesting as well. I and Armin discussed something similar in
another branch of this thread, and he included that among planned
ideas:
http://codespeak.net/pipermail/pypy-dev/2010q4/006648.html
http://codespeak.net/pipermail/pypy-dev/2010q4/006649.html

The difference is that in my proposal one would hook the memory
allocator for Python extensions, the .NET requires adding explicit
calls to the source code. However, the key idea is that you might need
to GC sooner if there is lots of unmanaged memory.

Unfortunately, MSDN docs about those methods do not give pointers to
the heuristics used:
http://msdn.microsoft.com/en-us/library/system.gc.addmemorypressure.aspx
http://msdn.microsoft.com/en-us/library/system.gc.removememorypressure.aspx

Best regards
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/



More information about the Pypy-dev mailing list