[Cython] Wrong order of __Pyx_DECREF when calling a function with an implicit str → char* conversion.

Stefan Behnel stefan_ml at behnel.de
Mon Jul 7 00:38:22 CEST 2014


Emmanuel Gil Peyrot, 28.05.2014 13:25:
> I was testing my cython codebase on top of pypy/cpyext, and I found a
> memory corruption.  After investigating it with the pypy guys (thanks
> arigato!), we identified it as a cython bug:
> 
>  cdef void test(char *string):
>      print(string)
> 
>  def run(array):
>      test(array[0])
> 
> This code works fine under cpython, but looking at the generated code
> for the last line, the call to test is done with a pointer (__pyx_t_2)
> to some potentially deallocated string (__pyx_t_1), which isn’t freed
> just yet on cpython but is on pypy:
> 
>  __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_array, 0, …);
>  __Pyx_GOTREF(__pyx_t_1);
>  __pyx_t_2 = __Pyx_PyObject_AsString(__pyx_t_1);
>  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
>  __pyx_f_1a_test(__pyx_t_2);
> 
> 
> The obvious solution is to swap the last two lines, it then works fine
> on pypy (although not necessarily if the test function stores the
> pointer somewhere, but it’s not cython’s fault then).
> 
> This issue can also happen with an explicit cast:
> 
>  pointer = <char *>array[0]
>  test(pointer)

I pushed a fix for both cases here:

https://github.com/cython/cython/commit/37e4a20615c323b695bd2ec5db0dc20a25a4417f

There are still a couple of tests failing, but that should be fixable.

The change moves the compile time error to assignment time and postpones
the temp reference cleanup for char* inside of expressions. This makes
temporary char* handling generally safer, including passing it into
external functions (which, as you said, is safe as long as the user takes
the obvious bit of care to not store it beyond the call scope).

The only drawback is that this may keep temporary Python strings alive
longer than strictly necessary (and longer than before), but if that really
becomes a problem, it should be easy to work around by splitting up the
expressions.


> I’m not sure if it should be addressed the same way, because that would
> mean keeping a reference to array[0] for all the lifetime of the
> current scope, but it could still prevent obscure bugs like the memory
> corruption I had.

The generally assumption, and the reason why indexing is considered safer
than e.g. concatenation, is that indexing often returns a duplicated
reference to an object stored in a container. That does not necessarily
apply to PyPy's cpyext, but the change above should at least fix the
problem you found there.

Stefan



More information about the cython-devel mailing list