Embedding Python crash on PyTuple_New

MRAB python at mrabarnett.plus.com
Tue Nov 23 19:46:05 EST 2021


On 2021-11-23 20:25, Arnaud Loonstra wrote:
> On 23-11-2021 18:31, MRAB wrote:
>> On 2021-11-23 16:04, Arnaud Loonstra wrote:
>>> On 23-11-2021 16:37, MRAB wrote:
>>>> On 2021-11-23 15:17, MRAB wrote:
>>>>> On 2021-11-23 14:44, Arnaud Loonstra wrote:
>>>>>> On 23-11-2021 15:34, MRAB wrote:
>>>>>>> On 2021-11-23 12:07, Arnaud Loonstra wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've got Python embedded successfully in a program up until now 
>>>>>>>> as I'm
>>>>>>>> now running into weird GC related segfaults. I'm currently trying to
>>>>>>>> debug this but my understanding of CPython limits me here.
>>>>>>>>
>>>>>>>> I'm creating a Tuple in C but it crashes on creating it after a 
>>>>>>>> while.
>>>>>>>> It doesn't make sense which makes me wonder something else must be
>>>>>>>> happening? Could be it just crashes here because the GC is 
>>>>>>>> cleaning up
>>>>>>>> stuff completely unrelated to the allocation of the new tuple? 
>>>>>>>> How can I
>>>>>>>> troubleshoot this?
>>>>>>>>
>>>>>>>> I've got CPython compiled with  --with-valgrind --without-pymalloc
>>>>>>>> --with-pydebug
>>>>>>>>
>>>>>>>> In C I'm creating a tuple with the following method:
>>>>>>>>
>>>>>>>> static PyObject *
>>>>>>>> s_py_zosc_tuple(pythonactor_t *self, zosc_t *oscmsg)
>>>>>>>> {
>>>>>>>>       assert(self);
>>>>>>>>       assert(oscmsg);
>>>>>>>>       char *format = zosc_format(oscmsg);
>>>>>>>>
>>>>>>>>       PyObject *rettuple = PyTuple_New((Py_ssize_t) 
>>>>>>>> strlen(format) );
>>>>>>>>
>>>>>>>> It segfaults here (frame 16) after 320 times (consistently)
>>>>>>>>
>>>>>>>>
>>>>>>>> 1   __GI_raise             raise.c          49   0x7ffff72c4e71
>>>>>>>> 2   __GI_abort             abort.c          79   0x7ffff72ae536
>>>>>>>> 3   fatal_error            pylifecycle.c    2183 0x7ffff7d84b4f
>>>>>>>> 4   Py_FatalError          pylifecycle.c    2193 0x7ffff7d878b2
>>>>>>>> 5   _PyObject_AssertFailed object.c         2200 0x7ffff7c93cf2
>>>>>>>> 6   visit_decref           gcmodule.c       378  0x7ffff7dadfd5
>>>>>>>> 7   tupletraverse          tupleobject.c    623  0x7ffff7ca3e81
>>>>>>>> 8   subtract_refs          gcmodule.c       406  0x7ffff7dad340
>>>>>>>> 9   collect                gcmodule.c       1054 0x7ffff7dae838
>>>>>>>> 10  collect_with_callback  gcmodule.c       1240 0x7ffff7daf17b
>>>>>>>> 11  collect_generations    gcmodule.c       1262 0x7ffff7daf3f6
>>>>>>>> 12  _PyObject_GC_Alloc     gcmodule.c       1977 0x7ffff7daf4f2
>>>>>>>> 13  _PyObject_GC_Malloc    gcmodule.c       1987 0x7ffff7dafebc
>>>>>>>> 14  _PyObject_GC_NewVar    gcmodule.c       2016 0x7ffff7daffa5
>>>>>>>> 15  PyTuple_New            tupleobject.c    118  0x7ffff7ca4da7
>>>>>>>> 16  s_py_zosc_tuple        pythonactor.c    366  0x55555568cc82
>>>>>>>> 17  pythonactor_socket     pythonactor.c    664  0x55555568dac7
>>>>>>>> 18  pythonactor_handle_msg pythonactor.c    862  0x55555568e472
>>>>>>>> 19  pythonactor_handler    pythonactor.c    828  0x55555568e2e2
>>>>>>>> 20  sphactor_actor_run     sphactor_actor.c 855  0x5555558cb268
>>>>>>>> ... <More>
>>>>>>>>
>>>>>>>> Any pointer really appreciated.
>>>>>>>>
>>>>>>> You're creating a tuple that'll have the same number of members as 
>>>>>>> the length of a string? That looks strange to me.
>>>>>>>
>>>>>>> How are you setting the tuple's members?
>>>>>>
>>>>>> It's from a serialisation format called OSC. The string describes the
>>>>>> type of bytes, every character is a type.
>>>>>>
>>>>>> I'm creating the tuple as follows:
>>>>>>
>>>>>> PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );
>>>>>>
>>>>>> Then I iterate the OSC message using the format string, (just showing
>>>>>> handling an int (i))
>>>>>>
>>>>>>       char type = '0';
>>>>>>       Py_ssize_t pos = 0;
>>>>>>       const void *data =  zosc_first(oscmsg, &type);
>>>>>>       while(data)
>>>>>>       {
>>>>>>           switch (type)
>>>>>>           {
>>>>>>           case('i'):
>>>>>>           {
>>>>>>               int32_t val = 9;
>>>>>>               int rc = zosc_pop_int32(oscmsg, &val);
>>>>>>               assert(rc == 0);
>>>>>>               PyObject *o = PyLong_FromLong((long)val);
>>>>>>               assert( o );
>>>>>>               rc = PyTuple_SetItem(rettuple, pos, o);
>>>>>>               assert(rc == 0);
>>>>>>               break;
>>>>>>           }
>>>>>>
>>>>>> Full code is here:
>>>>>>
>>>>>> https://github.com/hku-ect/gazebosc/blob/822452dfa27201db274d37ce09e835d98fe500b2/Actors/pythonactor.c#L360 
>>>>>>
>>>>>>
>>>>> Looking at that code, you have:
>>>>>
>>>>>       PyObject *o = Py_BuildValue("s#", str, 1);
>>>>>
>>>>> what I'd check is the type of the 1 that you're passing. Wouldn't the
>>>>> compiler assume that it's an int?
>>>>>
>>>>> The format string tells the function to expect a Py_ssize_t, but how
>>>>> would the compiler know that?
>>>>>
>>>> Looking at https://www.mankier.com/3/zosc, it says for 'T' and 'F' 
>>>> "(no value required)", but you're doing:
>>>>
>>>>      int rc = zosc_pop_bool(oscmsg, &bl);
>>>>
>>>> If no value is required, is there a bool there to be popped?
>>>
>>> The value is only required to set a user provided boolean to the value
>>> in the message. There's no boolean value encoded in the message, only
>>> the T and F in the format string.
>>>
>>> With regards to the Py_BuildValue("s#", str, 1);, that's a valid point.
>>> I'll fix that. However in the segfaults I'm testing that code is not
>>> touched.
>> 
>> You can use "C" as a format string for Py_BuildValue to convert a C int 
>> representing a character to a Python string.
>> 
>>> I'm now testing different parts of the code to see if it runs stable.
>>> I've found it runs stable if I do not process the returned tuple.
>>>
>>> PyObject *pReturn = PyObject_CallMethod(self->pyinstance,
>>>                       "handleSocket", "sOsss",
>>>                       oscaddress,
>>>                       py_osctuple,
>>>                       ev->type, ev->name, strdup(ev->uuid));
>>> Py_XINCREF(pReturn);
>>>
>> Why the Py_XINCREF? PyObject_CallMethod returns a new reference. The 
>> Py_DECREF that you do later won't destroy the object because of that 
>> additional Py_XINCREF, so that's a memory leak.
>> 
>>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L673 
>>>
>>>
>>> and a bit further in the code I convert the Python tuple to an OSC 
>>> message:
>>>
>>> zosc_t *retosc = s_py_zosc(pAddress, pData);
>>>
>>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L732 
>>>
>>>
>>> If I change that line to:
>>>
>>> zosc_t *retosc = zosc_create("/temp", "ii", 32, 64);
>>>
>>> It runs stable.
>>>
>>> I would turn my attention to s_py_zosc function but I'm not sure. Since
>>> the errors are GC related it could caused anywhere?
>>>
>> Basically, yes, but I won't be surprised if it was due to too few 
>> INCREFs or too many DECREFs somewhere.
>> 
>>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286 
>>>
>>>
>> Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);" 
>> after "after zosc_pop_float" or "zosc_pop_double".
> 
> Thanks for those pointers! I think your intuition is right. I might have
> found the bugger. In s_py_zosc I call Py_DECREF on pAddress and pData.
> However they are acquired by PyTuple_GetItem which returns a borrowed
> reference. I think pAddress and pData are then also 'decrefed' when the
> pReturn tuple which contains pAddress and pData is 'decrefed'?
> 
Yes, members of a container are DECREFed when the container is destroyed.

It's bad practice for a function to DECREF its arguments unless the 
function's sole purpose is cleanup because the function won't know where 
the arguments came from.

> I'm testing it now but it's running stable for a while now.
> 
> Preparing a PR: https://github.com/hku-ect/gazebosc/pull/181
> 



More information about the Python-list mailing list