Embedding Python crash on PyTuple_New

Arnaud Loonstra arnaud at sphaero.org
Tue Nov 23 11:04:32 EST 2021


On 23-11-2021 16:37, MRAB wrote:
> On 2021-11-23 15:17, MRAB wrote:
>> On 2021-11-23 14:44, Arnaud Loonstra wrote:
>>> On 23-11-2021 15:34, MRAB wrote:
>>>> On 2021-11-23 12:07, Arnaud Loonstra wrote:
>>>>> Hi,
>>>>>
>>>>> I've got Python embedded successfully in a program up until now as I'm
>>>>> now running into weird GC related segfaults. I'm currently trying to
>>>>> debug this but my understanding of CPython limits me here.
>>>>>
>>>>> I'm creating a Tuple in C but it crashes on creating it after a while.
>>>>> It doesn't make sense which makes me wonder something else must be
>>>>> happening? Could be it just crashes here because the GC is cleaning up
>>>>> stuff completely unrelated to the allocation of the new tuple? How 
>>>>> can I
>>>>> troubleshoot this?
>>>>>
>>>>> I've got CPython compiled with  --with-valgrind --without-pymalloc
>>>>> --with-pydebug
>>>>>
>>>>> In C I'm creating a tuple with the following method:
>>>>>
>>>>> static PyObject *
>>>>> s_py_zosc_tuple(pythonactor_t *self, zosc_t *oscmsg)
>>>>> {
>>>>>       assert(self);
>>>>>       assert(oscmsg);
>>>>>       char *format = zosc_format(oscmsg);
>>>>>
>>>>>       PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );
>>>>>
>>>>> It segfaults here (frame 16) after 320 times (consistently)
>>>>>
>>>>>
>>>>> 1   __GI_raise             raise.c          49   0x7ffff72c4e71
>>>>> 2   __GI_abort             abort.c          79   0x7ffff72ae536
>>>>> 3   fatal_error            pylifecycle.c    2183 0x7ffff7d84b4f
>>>>> 4   Py_FatalError          pylifecycle.c    2193 0x7ffff7d878b2
>>>>> 5   _PyObject_AssertFailed object.c         2200 0x7ffff7c93cf2
>>>>> 6   visit_decref           gcmodule.c       378  0x7ffff7dadfd5
>>>>> 7   tupletraverse          tupleobject.c    623  0x7ffff7ca3e81
>>>>> 8   subtract_refs          gcmodule.c       406  0x7ffff7dad340
>>>>> 9   collect                gcmodule.c       1054 0x7ffff7dae838
>>>>> 10  collect_with_callback  gcmodule.c       1240 0x7ffff7daf17b
>>>>> 11  collect_generations    gcmodule.c       1262 0x7ffff7daf3f6
>>>>> 12  _PyObject_GC_Alloc     gcmodule.c       1977 0x7ffff7daf4f2
>>>>> 13  _PyObject_GC_Malloc    gcmodule.c       1987 0x7ffff7dafebc
>>>>> 14  _PyObject_GC_NewVar    gcmodule.c       2016 0x7ffff7daffa5
>>>>> 15  PyTuple_New            tupleobject.c    118  0x7ffff7ca4da7
>>>>> 16  s_py_zosc_tuple        pythonactor.c    366  0x55555568cc82
>>>>> 17  pythonactor_socket     pythonactor.c    664  0x55555568dac7
>>>>> 18  pythonactor_handle_msg pythonactor.c    862  0x55555568e472
>>>>> 19  pythonactor_handler    pythonactor.c    828  0x55555568e2e2
>>>>> 20  sphactor_actor_run     sphactor_actor.c 855  0x5555558cb268
>>>>> ... <More>
>>>>>
>>>>> Any pointer really appreciated.
>>>>>
>>>> You're creating a tuple that'll have the same number of members as 
>>>> the length of a string? That looks strange to me.
>>>>
>>>> How are you setting the tuple's members?
>>>
>>> It's from a serialisation format called OSC. The string describes the
>>> type of bytes, every character is a type.
>>>
>>> I'm creating the tuple as follows:
>>>
>>> PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );
>>>
>>> Then I iterate the OSC message using the format string, (just showing
>>> handling an int (i))
>>>
>>>       char type = '0';
>>>       Py_ssize_t pos = 0;
>>>       const void *data =  zosc_first(oscmsg, &type);
>>>       while(data)
>>>       {
>>>           switch (type)
>>>           {
>>>           case('i'):
>>>           {
>>>               int32_t val = 9;
>>>               int rc = zosc_pop_int32(oscmsg, &val);
>>>               assert(rc == 0);
>>>               PyObject *o = PyLong_FromLong((long)val);
>>>               assert( o );
>>>               rc = PyTuple_SetItem(rettuple, pos, o);
>>>               assert(rc == 0);
>>>               break;
>>>           }
>>>
>>> Full code is here:
>>>
>>> https://github.com/hku-ect/gazebosc/blob/822452dfa27201db274d37ce09e835d98fe500b2/Actors/pythonactor.c#L360 
>>>
>>>
>> Looking at that code, you have:
>>
>>       PyObject *o = Py_BuildValue("s#", str, 1);
>>
>> what I'd check is the type of the 1 that you're passing. Wouldn't the
>> compiler assume that it's an int?
>>
>> The format string tells the function to expect a Py_ssize_t, but how
>> would the compiler know that?
>>
> Looking at https://www.mankier.com/3/zosc, it says for 'T' and 'F' "(no 
> value required)", but you're doing:
> 
>      int rc = zosc_pop_bool(oscmsg, &bl);
> 
> If no value is required, is there a bool there to be popped?

The value is only required to set a user provided boolean to the value 
in the message. There's no boolean value encoded in the message, only 
the T and F in the format string.

With regards to the Py_BuildValue("s#", str, 1);, that's a valid point. 
I'll fix that. However in the segfaults I'm testing that code is not 
touched.

I'm now testing different parts of the code to see if it runs stable. 
I've found it runs stable if I do not process the returned tuple.

PyObject *pReturn = PyObject_CallMethod(self->pyinstance,
                     "handleSocket", "sOsss",
                     oscaddress,
                     py_osctuple,
                     ev->type, ev->name, strdup(ev->uuid));
Py_XINCREF(pReturn);

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L673

and a bit further in the code I convert the Python tuple to an OSC message:

zosc_t *retosc = s_py_zosc(pAddress, pData);

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L732

If I change that line to:

zosc_t *retosc = zosc_create("/temp", "ii", 32, 64);

It runs stable.

I would turn my attention to s_py_zosc function but I'm not sure. Since 
the errors are GC related it could caused anywhere?

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286 



More information about the Python-list mailing list