C API PyObject_Call segfaults with string

Jen Kris jenkris at tutanota.com
Wed Feb 9 19:40:59 EST 2022


This is a follow-on to a question I asked yesterday, which was answered by MRAB.   I'm using the Python C API to load the Gutenberg corpus from the nltk library and iterate through the sentences.  The Python code I am trying to replicate is:

from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
        sentences = gutenberg.sents(fileid)
        etc

I have everything finished down to the last line (sentences = gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, but it segfaults.  The fileid is a string -- the first fileid in this corpus is "austen-emma.txt."  

pName = PyUnicode_FromString("nltk.corpus");
pModule = PyImport_Import(pName);

pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
pSentMod = PyObject_GetAttrString(pSubMod, "sents");

pFileIds = PyObject_CallObject(pFidMod, 0);
pListItem = PyList_GetItem(pFileIds, listIndex);
pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
pListStr = PyBytes_AS_STRING(pListStrE);
Py_DECREF(pListStrE);

// sentences = gutenberg.sents(fileid)
PyObject *c_args = Py_BuildValue("s", pListStr);  
PyObject *NullPtr = 0;
pSents = PyObject_Call(pSentMod, c_args, NullPtr);

The final line segfaults:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6e4e8d5 in _PyEval_EvalCodeWithName ()
   from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0

My guess is the problem is in Py_BuildValue, which returns a pointer but it may not be constructed correctly.  I also tried it with "O" and it doesn't segfault but it returns 0x0. 

I'm new to using the C API.  Thanks for any help. 

Jen




More information about the Python-list mailing list