C API PyObject_CallFunctionObjArgs returns incorrect result

MRAB python at mrabarnett.plus.com
Sun Mar 6 20:42:05 EST 2022


On 2022-03-07 00:32, Jen Kris via Python-list wrote:
> I am using the C API in Python 3.8 with the nltk library, and I have a problem with the return from a library call implemented with PyObject_CallFunctionObjArgs.
> 
> This is the relevant Python code:
> 
> import nltk
> from nltk.corpus import gutenberg
> fileids = gutenberg.fileids()
> sentences = gutenberg.sents(fileids[0])
> sentence = sentences[0]
> sentence = " ".join(sentence)
> pt = nltk.word_tokenize(sentence)
> 
> I run this at the Python command prompt to show how it works:
>>>> sentence = " ".join(sentence)
>>>> pt = nltk.word_tokenize(sentence)
>>>> print(pt)
> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>>> type(pt)
> <class 'list'>
> 
> This is the relevant part of the C API code:
> 
> PyObject* str_sentence = PyObject_Str(pSentence);
> // nltk.word_tokenize(sentence)
> PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize");
> PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0);
> 
> (where pModule_mstr is the nltk library).
> 
> That should produce a list with a length of 7 that looks like it does on the command line version shown above:
> 
> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
> 
> But instead the C API produces a list with a length of 24, and the REPR looks like this:
> 
> '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", \',\', "\'", \']\', "\'", \']\']'
> 
> I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without success.
> 
> Thanks for any help on this.
> 
What is pSentence? Is it what you think it is?
To me it looks like it's either the list:

     ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']

or that list as a string:

     "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"

and that what you're tokenising.


More information about the Python-list mailing list