Embedding Python crash on PyTuple_New

MRAB python at mrabarnett.plus.com
Wed Nov 24 13:15:43 EST 2021


On 2021-11-24 07:59, Arnaud Loonstra wrote:
> 
> On 24-11-2021 01:46, MRAB wrote:
>> On 2021-11-23 20:25, Arnaud Loonstra wrote:
>>> On 23-11-2021 18:31, MRAB wrote:
>>>> On 2021-11-23 16:04, Arnaud Loonstra wrote:
>>>>> On 23-11-2021 16:37, MRAB wrote:
>>>>>> On 2021-11-23 15:17, MRAB wrote:
>>>>>>> On 2021-11-23 14:44, Arnaud Loonstra wrote:
>>>>>>>> On 23-11-2021 15:34, MRAB wrote:
>>>>>>>>> On 2021-11-23 12:07, Arnaud Loonstra wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've got Python embedded successfully in a program up until now 
>>>>>>>>>> as I'm
>>>>>>>>>> now running into weird GC related segfaults. I'm currently 
>>>>>>>>>> trying to
>>>>>>>>>> debug this but my understanding of CPython limits me here.
>>>>>>>>>>
>>>>>>>>>> I'm creating a Tuple in C but it crashes on creating it after a 
>>>>>>>>>> while.
>>>>>>>>>> It doesn't make sense which makes me wonder something else must be
>>>>>>>>>> happening? Could be it just crashes here because the GC is 
>>>>>>>>>> cleaning up
>>>>>>>>>> stuff completely unrelated to the allocation of the new tuple? 
>>>>>>>>>> How can I
>>>>>>>>>> troubleshoot this?
>>>>>>>>>>
>>>>>>>>>> I've got CPython compiled with  --with-valgrind --without-pymalloc
>>>>>>>>>> --with-pydebug
>>>>>>>>>>
>>>>>>>>>> In C I'm creating a tuple with the following method:
>>>>>>>>>>
>>>>>>>>>> static PyObject *
>>>>>>>>>> s_py_zosc_tuple(pythonactor_t *self, zosc_t *oscmsg)
>>>>>>>>>> {
>>>>>>>>>>       assert(self);
>>>>>>>>>>       assert(oscmsg);
>>>>>>>>>>       char *format = zosc_format(oscmsg);
>>>>>>>>>>
>>>>>>>>>>       PyObject *rettuple = PyTuple_New((Py_ssize_t) 
>>>>>>>>>> strlen(format) );
>>>>>>>>>>
>>>>>>>>>> It segfaults here (frame 16) after 320 times (consistently)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 1   __GI_raise             raise.c          49   0x7ffff72c4e71
>>>>>>>>>> 2   __GI_abort             abort.c          79   0x7ffff72ae536
>>>>>>>>>> 3   fatal_error            pylifecycle.c    2183 0x7ffff7d84b4f
>>>>>>>>>> 4   Py_FatalError          pylifecycle.c    2193 0x7ffff7d878b2
>>>>>>>>>> 5   _PyObject_AssertFailed object.c         2200 0x7ffff7c93cf2
>>>>>>>>>> 6   visit_decref           gcmodule.c       378  0x7ffff7dadfd5
>>>>>>>>>> 7   tupletraverse          tupleobject.c    623  0x7ffff7ca3e81
>>>>>>>>>> 8   subtract_refs          gcmodule.c       406  0x7ffff7dad340
>>>>>>>>>> 9   collect                gcmodule.c       1054 0x7ffff7dae838
>>>>>>>>>> 10  collect_with_callback  gcmodule.c       1240 0x7ffff7daf17b
>>>>>>>>>> 11  collect_generations    gcmodule.c       1262 0x7ffff7daf3f6
>>>>>>>>>> 12  _PyObject_GC_Alloc     gcmodule.c       1977 0x7ffff7daf4f2
>>>>>>>>>> 13  _PyObject_GC_Malloc    gcmodule.c       1987 0x7ffff7dafebc
>>>>>>>>>> 14  _PyObject_GC_NewVar    gcmodule.c       2016 0x7ffff7daffa5
>>>>>>>>>> 15  PyTuple_New            tupleobject.c    118  0x7ffff7ca4da7
>>>>>>>>>> 16  s_py_zosc_tuple        pythonactor.c    366  0x55555568cc82
>>>>>>>>>> 17  pythonactor_socket     pythonactor.c    664  0x55555568dac7
>>>>>>>>>> 18  pythonactor_handle_msg pythonactor.c    862  0x55555568e472
>>>>>>>>>> 19  pythonactor_handler    pythonactor.c    828  0x55555568e2e2
>>>>>>>>>> 20  sphactor_actor_run     sphactor_actor.c 855  0x5555558cb268
>>>>>>>>>> ... <More>
>>>>>>>>>>
>>>>>>>>>> Any pointer really appreciated.
> 
> [snip]
> 
>>>>>
>>>> Basically, yes, but I won't be surprised if it was due to too few 
>>>> INCREFs or too many DECREFs somewhere.
>>>>
>>>>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286 
>>>>>
>>>>>
>>>> Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);" 
>>>> after "after zosc_pop_float" or "zosc_pop_double".
>>>
>>> Thanks for those pointers! I think your intuition is right. I might have
>>> found the bugger. In s_py_zosc I call Py_DECREF on pAddress and pData.
>>> However they are acquired by PyTuple_GetItem which returns a borrowed
>>> reference. I think pAddress and pData are then also 'decrefed' when the
>>> pReturn tuple which contains pAddress and pData is 'decrefed'?
>>>
>> Yes, members of a container are DECREFed when the container is destroyed.
>> 
>> It's bad practice for a function to DECREF its arguments unless the 
>> function's sole purpose is cleanup because the function won't know where 
>> the arguments came from.
>> 
> 
> I'm finding it out now. What strikes me was how hard it was to debug
> this. I think it was caused because I INCREFed the return object. I
> guess I did that to workaround the wrong DECREF data in the return
> object. However that caused a hell to debug. I'm really curious what the
> best practices are for debugging embedded CPython.
> 
> Thanks big time for your feedback!
> 
What I do when writing the code is add comments showing what variables 
refer to an object at that point in the code, each suffixed with "+" if 
it owns a reference and/or "?" if it could be NULL.

Example 1:

     //>
     PyObject *my_tuple = PyTuple_New(count);
     //> my_tuple+?
     if (!my_tuple)
          goto error;
     //> my_tuple+

"//>" means that there are no variables that point to an object.

"//> my_tuple+?" means that "my_tuple" points to an object and it owns a 
reference, but it might be NULL.

"//> my_tuple+" means that "my_tuple" points to an object and it owns a 
reference.

Example 2:

     //>
     PyObject *my_item = PyList_New(my_list, index);
     //> my_tuple?
     if (!my_tuple)
          goto error;
     //> my_tuple

"//>" means that there are no variables that point to an object.

"//> my_tuple?" means that "my_tuple" points to an object, but it might 
be NULL.

"//> my_tuple" means that "my_tuple" points to an object.



More information about the Python-list mailing list