[Python-ideas] Automatic context managers

Jim Jewett jimjjewett at gmail.com
Fri Apr 26 21:45:58 CEST 2013


(Quoting from MRAB's quote, since I don't see the original -- I
suspect I also mangled some attributions internally)

On 26/04/2013 17:52, Chris Angelico wrote:
>  On Sat, Apr 27, 2013 at 1:54 AM, MRAB <python at mrabarnett.plus.com> wrote:
>> On 26/04/2013 14:02, anatoly techtonik wrote:

>>> This circular reference problem is interesting. In object space it
>>> probably looks like a stellar detached from the visible (attached)
>>> universe. Is the main problem in detecting it?


Yes.  That is where Reference Counting fails, and is the reason that
CPython added (cylic) garbage collection.

Note that Garbage Collectors need to have a list of "roots" which can
keep things alive, and a way of recognizing links.  If some objects
(even those implemented in C) use pointers that the garbage collector
doesn't know about (e.g., by adding a constant to a base address
instead of storing the address directly, or storing tag bits in the
low-order portion of the address), then there will be objects that
cannot ever be safely collected.  Officially, that can be a bug in the
object implementation, but if it leads to a segfault, python still
looks bad.


>> The problem is in knowing in which order the objects should be
>> collected.

This is a problem only once a garbage cycle has already been detected.
 But it is indeed a major problem.

The above mean that garbage collectors must look at every live object
in the entire system for every full collection; there is plenty of
research on how to speed things up (or even just make the system more
responsive) by doing "extra" work for partial collections.  (I put
"extra" in scare-quotes, because these heuristics increase the
worst-case and the theoretical average case, but often decrease the
normal-case workload.)

Of course, if you're paying this full price anyhow, why bother paying
the additional price of reference-counting?  (Because it is one of
those heuristics that actually save work in practice, if your data
isn't very cyclic.  But if you use a very cyclic style, or library...)

>> For example, if A refers to B and B refers to A, should you collect A
>> then B, or B then A? If you collect A first, then, for a time, B will
>> be referring to a non-existent object. That's not good if the objects
>> have destructors which need to be run.

> Once it's been proven that there's an unreferenced cycle, why not
> simply dispose of one of the objects, and replace all references to it
> (probably only one - preferably pick an object with the fewest
> references) with a special temporary object?

Backwards compatibility.  If my pointed-to object no longer has the
methods I expect (perhaps even just "close"), I will get exceptions.
They won't be the ones for which I was prepared.  Now, instead of
leaking a few resources (only until the program exits), I will be
exiting prematurely, perhaps without a chance to do other cleanup.

(Mrab wrote:)
  I wonder whether it would be best to call the __del__ method of the
  newest object (if it's possible to determine which is the newest) in
  such a case, then replace _that_ object with the DestructedObject (the
  "special marker" would be just a special "destructed" object).

You can get most of the way there with object address, and farther
with timestamping at creation (which also costs more memory).

But is the difference between 99.5 and 99.8 worth complicating things
and possibly breaking the last 0.2 more severely?

I *would* like a __close__ magic method that worked like __del__,
except that it would be OK to call as soon as you found the object
in a garbage cycle.  (This also means that the __close__ method's
contract should state explicitly that it might be called multiple times,
and cycles might be broken in an arbitrary order.)

In the past, this has been rejected as insufficiently motivated, but
that may have changed.

-jJ



More information about the Python-ideas mailing list