[Python-Dev] Issue 10194 - Adding a gc.remap() function

Peter Ingebretson pingebre at yahoo.com
Tue Oct 26 19:11:00 CEST 2010


--- On Tue, 10/26/10, Hrvoje Niksic <hrvoje.niksic at avl.com> wrote:

> What about objects that don't implement tp_traverse because
> they cannot take part in cycles?

A significant majority of objects that can hold references to other 
objects can take part in cycles and do implement tp_traverse.  My 
original thought was that modifying any references not visible to 
the cyclic GC would be out of the scope of gc.remap.

Even adding a 'tp_extended_traverse' method might not help solve 
this problem because untracked objects are not in any generation list, 
so there is no general way to find all of them.

> Changing immutable objects such as tuples and frozensets
> doesn't exactly sound appealing.

My original Python-only approach cloned immutable objects that 
referenced objects that were to be remapped, and then added the 
old and new immutable object to the mapping.  This worked well, 
although it was somewhat complicated because it had to happen in 
dependency order (e.g., to handle tuples of tuples in frozensets).

I thought about keeping this, but I am now convinced that as long 
as you are doing something as drastic as changing references in the 
heap you may as well change immutable objects.

The main argument is that preserving immutable objects increases the 
complexity of remapping and does not actually solve many problems.  
The primary reason for objects to be immutable is so that their 
comparison operators and hash value can remain consistent.  Changing, 
for example, the contents of a tuple that a dictionary key references 
has the same effect as changing the identity of the tuple -- both 
modify the hash value of the key and thus invalidate the dictionary.  
The full reload processs needs to rehash collections invalidated by 
hash values changing, so we might as well modify the contents of tuples.

> > the signature of visitproc has been modified to take (PyObject **) 
> > instead of (PyObject *) so that a visitor can modify fields
> > visited with Py_VISIT.
> 
> This sounds like a bad idea -- visitproc is not limited to
> visiting struct members.  Visited objects can be stored
> in data structures where their address cannot be directly
> obtained.
>
> If you want to go this route, rather create an extended
> visit procedure (visitchangeproc?) that accepts a function
> that can change the reference.  A convenience function
> or macro could implement this for the common case of struct
> member or PyObject**.

This is a compelling argument.  I considered adding an extended 
traverse / visit path, but decided against it after not finding 
any cases in the base distribution that required it.  The 
disadvantage of creating an additional method is that C types will 
have yet another method to implement for the gc (tp_traverse, 
tp_clear, and now tp_traverse_modify(?)).  On the other hand, you've 
convinced me that this is necessary in some cases, so it might as 
well be used in all of them.  Jon Parise also pointed out in a 
private communication that this eliminates the minor performance 
impact on tp_traverse, which is another advantage over my change.

If a 'tp_traverse_modify' function were added, many types could 
replace their custom tp_clear function with a generic method 
that makes use of (visitchangeproc), which somewhat mitigates adding 
another method.



      


More information about the Python-Dev mailing list