C API PyObject_Call segfaults with string

MRAB python at mrabarnett.plus.com
Wed Feb 9 21:43:45 EST 2022


On 2022-02-10 01:37, Jen Kris via Python-list wrote:
> I'm using Python 3.8 so I tried your second choice:
> 
> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
> 
> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
> 
'PyObject_CallFunction' looks like a good one to use:

"""PyObject* PyObject_CallFunction(PyObject *callable, const char 
*format, ...)

Call a callable Python object callable, with a variable number of C 
arguments. The C arguments are described using a Py_BuildValue() style 
format string. The format can be NULL, indicating that no arguments are 
provided.
"""

[snip]

What I do is add comments to keep track of what objects I have 
references to at each point and whether they are new references or could 
be NULL.

For example:

     pName = PyUnicode_FromString("nltk.corpus");
     //> pName+?

This means that 'pName' contains a reference, '+' means that it's a new 
reference, and '?' means that it could be NULL (usually due to an 
exception, but not always) so I need to check it.

Continuing in this vein:

     pModule = PyImport_Import(pName);
     //> pName+? pModule+?

     pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
     //> pName+? pModule+? pSubMod+?
     pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
     //> pName+? pModule+? pSubMod+? pFidMod+?
     pSentMod = PyObject_GetAttrString(pSubMod, "sents");
     //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+?

     pFileIds = PyObject_CallObject(pFidMod, 0);
     //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? 
PyObject_CallObject+?
     pListItem = PyList_GetItem(pFileIds, listIndex);
     //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? 
PyObject_CallObject+? pListItem?
     pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
     //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? 
PyObject_CallObject+? pListItem? pListStrE+?

As you can see, there's a lot of leaked references building up.

Note how after:

     pListItem = PyList_GetItem(pFileIds, listIndex);

the addition is:

     //> pListItem?

This means that 'pListItem' contains a borrowed (not new) reference, but 
could be NULL.

I find it easiest to DECREF as soon as I no longer need the reference 
and remove a name from the list as soon I no longer need it (and 
DECREFed where).

For example:

     pName = PyUnicode_FromString("nltk.corpus");
     //> pName+?
     if (!pName)
         goto error;
     //> pName+
     pModule = PyImport_Import(pName);
     //> pName+ pModule+?
     Py_DECREF(pName);
     //> pModule+?
     if (!pModule)
         goto error;
     //> pModule+

I find that doing this greatly reduces the chances of getting the 
reference counting wrong, and I can remove the comments once I've 
finished the function I'm writing.


More information about the Python-list mailing list