Cyclops 0.9.4

Tim Peters tim_one at email.msn.com
Fri Jul 23 03:23:21 EDT 1999


[Tim]
> You want to know about objects still living that you don't expect
> to be living.  No system can answer that for you!  They can't know
> what you expect <wink>.

[Greg Ewing]
> I'm quite happy to go through the list of living objects
> and decide for myself which ones should be dead.

Nobody can sift thru hundreds of thousands of live objects by eye, which
happens routinely in large long-running apps.

> What I meant was, what I really want to know is how those objects
> got reached.

Well, 100,000 objects can often be reached in 500,000 ways via one direct
link -- & it gets messier the longer the path length you consider.  You have
to reduce the set of objects that are potentially interesting; Cyclops
requires you to identify them in advance; I'm not sure how it could be
easier to identify them later.

> If there are cycles involved, it wouldn't hurt to be told about
> them, but they're not the most important thing.

Cycles do have special status in CPython, though.  I find them the most
fruitful thing to look at first.  The more of them I see, though, I agree
they're not the whole banana they've been made out to be.

>> If what you want is a list of all reachable objects,

> No, I'd like a list of *all* objects, reachable or not.

All objects ever allocated, or all objects whose refcounts haven't yet
fallen to 0?

The former implies no object memory is ever recycled, which would make it
impractical for the large long-running apps most in *need* of help.

The latter requires a way to forget objects whose refcounts have reached 0,
and so likely requires a doubly-linked list of objects (so that dead
non-cyclic objects can unlink themselves efficiently when they die).  Since
"even ints are boxed" in Python, that's a major memory hit.

A short while back Guido pitched a compromise, maintaining a list of all (&
only) allocated dicts (whether dicts whose refcount had reached zero would
be purged from this list wasn't resolved).  Since each class and instance
object has a dict, this probably allows getting at 99% of problem cases.
Although descriptor-based slicing of large array objects seems to account
for all major leaks reported by others on c.l.py this month <wink/sigh>.

> Cyclops could then do what it does now, but for the isolated islands
> of garbage as well.

Isolated garbage in CPython necessarily contains cycles, and all isolated
garbage (cyclic or not) is necessarily reachable *from* cycles, so Cyclops
has *a* handle on that now.  It doesn't need more information so much as it
needs ways to present less to the user.  Let's say you have everything you
want, and even more:  for every object ever allocated, Cyclops2 can give you
a timestamped history of every change ever made to it, every path from which
it could be reached at any given time, and similarly for every object
reachable from it.

Then what?  Beats me -- every app seems different, and each requires a lot
of thought to whittle the last trillion machine cycles down to the seven
that matter.  More info doesn't help without a clear plan for exploiting it.

>> All of those are also needed if Python is ever to move toward builtin
>> portable mark-&-sweep (optional or not).  The advantage over "walking an
>> (explicit) list" is no overhead (time or space) unless & until
>> it's used.

> If there were mark & sweep there wouldn't be unreachable
> garbage, so in that case you are right -- there would be
> no point in keeping a list. I'm only suggesting it as an
> interim measure.

No, what I described is essential for the M half of portable M&S, but on top
of that a list is essential for the S half.  Without a list of allocated
objects, Python *can't find* the "isolated islands of garbage" in order to
clean them up -- it could mark all reachable objects, but couldn't sweep the
unreachable ones.

You should take that as good news:  if Python is ever to *reach* builtin
portable M&S, the list you want will be there in one form or another.  But
in my repeated experience here, unreachable garbage is not the problem --
it's reachable stuff you expected *would* be garbage but isn't; and a list
isn't needed to find that.

>> then a Cyclops-like thingy could tell you not only that something
>> unexpected is still alive, but also from where it can be reached

> I don't understand why Cyclops couldn't be made to do
> that now. It reached those objects somehow -- all it needs
> to do is remember how!

I think you should try using it once <wink>.  You have to explicitly
register the set of objects you're curious about, and in practice *that's*
(i.e., your registration) how it "reached them".  "Yes, object X is
reachable in 12 ways, and here are the ways I know about:  (1) you
registered it.  (2) umm ... well, no other way I can find -- register more
stuff, chase more types, filter less away, and try again".

There was once another pile of code to trace out all ways of reaching each
registered object from all other registered objects, but that didn't turn up
anything but cycles (which it finds more cheaply via other means), plus
hundreds of thousands of useless paths ("OK, A.x can be reached from A"), so
I tossed it.  I think I'll put it back in, though -- one of these days it's
bound to stumble into something worth finding even if by accident <0.6
wink>.

> I don't mean to sound ungrateful for Cyclops, by the way --
> it's a great idea.

Greg, it's much more than an idea:  it's code you can use <smile>.

> All it needs is a little bit more help from the Python core to make
> it even greater...

The core help I think it could best exploit was covered in the preceding msg
(namely, the "M" half of M&S machinery).  That's not enough to find
unreachable islands unless an element of the island was registered, but
those appear to be the tail of the leaky dog.

canine-metaphors-in-service-of-a-barfworthy-problem-ly y'rs  - tim






More information about the Python-list mailing list