SIGSEGV and SIGILL inside PyCFunction_Call

dieter dieter at handshake.de
Thu Jul 20 01:44:26 EDT 2017


Anders Wegge Keller <wegge at wegge.dk> writes:
> ...
>  I have an ongoing issue with my usenet setup. I'm that one dude who don't
> want to learn perl. That means that I have to build inn from source, so I
> can enable the python interpreter. That's not so bad, and the errors that
> show up have been something that I have been able to figure out by myself.
> At least up until now. I have an almost 100% repeatable crash, when nnrpd
> performs the user authentication step. Backtracing the core dum gives this:
>
> #0  0x0000564a864e2d63 in ?? ()
> #1  0x00007f9609567091 in call_function (oparg=<optimized out>, 
>     pp_stack=0x7ffda2d801b0) at ../Python/ceval.c:4352
>
> Note:   Line 4352         C_TRACE(x, PyCFunction_Call(func,callargs,NULL));
>
> #2  PyEval_EvalFrameEx (
>     f=Frame 0x7f9604758050, for file /etc/news/filter/nnrpd_auth.py, 
>     line 67, in __init__ (self=<AUTH(dbCursor=<Cursor(_result=None,
>     description=None, rownumber=None, messages=[], _executed=None, 
>
>  ...
>
>  Weird observation #1: Sometimes the reason is SIGSEGV, sometimes it's
> SIGILL. 

Python tends to be sensitive to the stack size. In previous times,
there have often be problems because the stack size for threads
has not been large enough. Not sure, whether "nnrpd" is multi threaded
and provides a sufficiently large stack for its threads.

A "SIGILL" often occurs because a function call has destroyed part
of the stack content and the return is erroneous (returning in the midst
of an instruction).

> ...
>  I'm not ready to give up yet, but I need some help proceeding from here.
> What do the C_TRACE really do,

The casing (all upper case letters) indicates a C preprocessor macro.
Search the "*.h" files for its definition.

I suppose that with a normal Python build (no debug build), the
macro will just call "PyCFunction_Call".
Alternatively, it might provide support for debugging, tracing
(activated by e.g. "pdb.set_trace()").


> and is there some way of getting a level
> deeper, to see what cause the SEGV. Also, how can the C code end up with an
> illegal instruction_

A likely cause for both "SIGSEGV" and "SIGILL" could be stack corruption
leading to a bad return or badly restored register values.

I would look at the maschine instructions (i.e. look at the assembler
rather than the C level) to find out precisely, which instruction
caused the signal.


Unfortunately, stack corruption is a non local problem (the point
where the problem is caused is usually far away from the point
where it is observed).

If the problem is not "too small stack size", you might need
a tool to analyse memory overrides.




More information about the Python-list mailing list