C API PyObject_Call segfaults with string

Inada Naoki songofacandy at gmail.com
Wed Feb 9 19:52:34 EST 2022


On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
<python-list at python.org> wrote:
>
> I have everything finished down to the last line (sentences = gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, but it segfaults.  The fileid is a string -- the first fileid in this corpus is "austen-emma.txt."
>
> pName = PyUnicode_FromString("nltk.corpus");
> pModule = PyImport_Import(pName);
>
> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>
> pFileIds = PyObject_CallObject(pFidMod, 0);
> pListItem = PyList_GetItem(pFileIds, listIndex);
> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
> pListStr = PyBytes_AS_STRING(pListStrE);
> Py_DECREF(pListStrE);

HERE.
PyBytes_AS_STRING() returns pointer in the pListStrE Object.
So Py_DECREF(pListStrE) makes pListStr a dangling pointer.

>
> // sentences = gutenberg.sents(fileid)
> PyObject *c_args = Py_BuildValue("s", pListStr);

Why do you encode&decode pListStrE?
Why don't you use just pListStrE?

> PyObject *NullPtr = 0;
> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>

c_args must tuple, but you passed a unicode object here.
Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue


> The final line segfaults:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff6e4e8d5 in _PyEval_EvalCodeWithName ()
>    from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>
> My guess is the problem is in Py_BuildValue, which returns a pointer but it may not be constructed correctly.  I also tried it with "O" and it doesn't segfault but it returns 0x0.
>
> I'm new to using the C API.  Thanks for any help.
>
> Jen
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list

Bests,

-- 
Inada Naoki  <songofacandy at gmail.com>


More information about the Python-list mailing list