[Python-Dev] Py_CLEAR to avoid crashes

Daniel Stutzbach daniel at stutzbachenterprises.com
Tue Feb 19 01:13:24 CET 2008


On Feb 18, 2008 4:52 PM, Neil Schemenauer <nas at arctrix.com> wrote:

> That sucks.  Most Py_DECREF calls are probably okay but it's going
> to be hard to find the ones that are not.  I can't think of anything
> we can do to make this trap harder to fall into.  Even using
> Py_CLEAR as a blunt tool is not a total solution. You could still
> end up with a null pointer dereference if the code is not written
> carefully.
>

Container types (particularly lists) go through great lengths to postpone
object deletion.  For example, to delete a slice from a list all of the
items must be copied to a temporary array, then the list object's pointers
are modified, then all the Py_DECREF's are called just before returning.

I have always seen this as a robustness versus efficiency issue.  It's
theoretically possible to set things up so that reference counter decrements
are actually postponed until after the C method/slot returns, but it's
slower than doing it immediately.  I wonder if adding support for postponed
decrements (without making it mandatory) would at least make the trap harder
to fall into.

For example:

- maintain a global array of pending decrefs
- before calling into any C method/slot, save the index of the current
end-of-array (in a local C variable on the stack)
- call the C method, which may call Py_DECREF_LATER(x) to append x to the
global array
- when the C method returns, decref anything newly appended to the array

The array would grow and shrink just as a list does (O(1) amortized time to
add/remove a pointer).

This would simplify a number of places in listobject.c as well as remove the
need for Py_TRASHCAN_*.  It would be entirely optional, so anyone who is
very careful and wants the speed of Py_DECREF can have it.  Also, the
deferment is very brief, since the decrefs occur right after the C method
returns.

The downside is having to store and check the global array length on every C
method call (basically 3 machine instructions).  The machine instructions
aren't so bad, but I'm not sure about the effects on the CPU cache.

So, like I said, a robustness versus performance trade-off. :-(

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20080218/37575de3/attachment.htm 


More information about the Python-Dev mailing list