[Cython] local variable handling in generators

Stefan Behnel stefan_ml at behnel.de
Mon May 23 11:15:07 CEST 2011


Vitja Makarov, 23.05.2011 10:50:
> 2011/5/23 Stefan Behnel:
>> Vitja Makarov, 23.05.2011 10:13:
>>>
>>> With live variable analysis that should be easy to save/restore only
>>> active variables at the yield point.
>>
>> "Active" in the sense of "modified", I suppose? That's what I was expecting.
>
> Active means that variable value will be used. In my example after
> 'print a' a isn't used anymore.

That's not correct then. In a generator, a modified value must be kept 
alive over a yield, even if it is no longer used afterwards.

We can safely reduce the write-back code to modified values, but we cannot 
reduce it to values to that will be used later on.


>>> Btw now only reaching definitions analysis is implemented. I'm going
>>> to optimize by replacing sets with bitsets. And then try to implement
>>> live varaiables.
>>>
>>> I'm going to delete variable reference using active variable info, but
>>> that could introduce small incompatiblity with CPython:
>>> a = X
>>> print a #<- a will be decrefed here
>>> print 'the end'
>>
>> That incompatibility is not small at all. It breaks this code:
>>
>>     x = b'abc'
>>     cdef char* c = x
>>
>> Even if 'x' is no longer used after this point, it *must not* get freed
>> before 'c' is going away as well. That's basically impossible to decide, as
>> users may pass 'c' into a function that stores it away for alter use.
>
> Yeah. That's hard to detect. But x could be marked as "don't decref
> when not-active"

How would you know that it needs to be marked like that? You won't 
necessarily see it in the code that a pointer was taken from the value, 
that might have happened within a called function.


>> I'm fine with deallocating variables that are no longer used after the user
>> explicitly assigned None to them (i.e. replace the None assignment by a
>> simple "DECREF + set to NULL" in that case). I don't think we should be
>> doing more than that.
>
> Hmm. Why should that be NULL if user sets it to None?

Because there is no user visible difference. None will always be available, 
even if the Cython code no longer holds a reference to it. So changing "x = 
None" into "Py_DECREF(x); x=NULL" is just fine, as long as we can make sure 
'x' is never accessed after this point.


> For instance:
>
> for i in args:
>      print i
>
> this code will be translated into:
>
> PyObject *i = NULL;
>
> for (;;)
> {
>     tmp = next();
>     if (!tmp) break;
>
>    Pyx_XDECREF(i);
>    i = tmp;
>    print(i);
> }
>
> using active variables information this could be translated into:
>
> PyObject *i = NULL;
>
> for (;;)
> {
>     tmp = next();
>     if (!tmp) break;
>
>    i = tmp;
>    print(i);
>    Pyx_DECREF(i);
> }

That's not correct, though. Python semantics dictate that 'i' must keep its 
value until the end of the function or until it's being reassigned to, 
whatever comes first. Remember that objects can have deallocators in 
Python. That must not be called at an undefined point.

The only thing that can safely be special cased is None. It's common in 
Python code to set a variable to None when the value is worth being 
deallocated (e.g. a large data structure). Cython can optimise this as I 
indicated above.

Stefan


More information about the cython-devel mailing list