[Cython] local variable handling in generators

Vitja Makarov vitja.makarov at gmail.com
Mon May 23 11:24:07 CEST 2011


2011/5/23 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 23.05.2011 10:50:
>>
>> 2011/5/23 Stefan Behnel:
>>>
>>> Vitja Makarov, 23.05.2011 10:13:
>>>>
>>>> With live variable analysis that should be easy to save/restore only
>>>> active variables at the yield point.
>>>
>>> "Active" in the sense of "modified", I suppose? That's what I was
>>> expecting.
>>
>> Active means that variable value will be used. In my example after
>> 'print a' a isn't used anymore.
>
> That's not correct then. In a generator, a modified value must be kept alive
> over a yield, even if it is no longer used afterwards.
>
> We can safely reduce the write-back code to modified values, but we cannot
> reduce it to values to that will be used later on.
>

I'm not sure how to get modified variables list at yield point. Now I
only know which assignments reach yield point.


>
>>>> Btw now only reaching definitions analysis is implemented. I'm going
>>>> to optimize by replacing sets with bitsets. And then try to implement
>>>> live varaiables.
>>>>
>>>> I'm going to delete variable reference using active variable info, but
>>>> that could introduce small incompatiblity with CPython:
>>>> a = X
>>>> print a #<- a will be decrefed here
>>>> print 'the end'
>>>
>>> That incompatibility is not small at all. It breaks this code:
>>>
>>>    x = b'abc'
>>>    cdef char* c = x
>>>
>>> Even if 'x' is no longer used after this point, it *must not* get freed
>>> before 'c' is going away as well. That's basically impossible to decide,
>>> as
>>> users may pass 'c' into a function that stores it away for alter use.
>>
>> Yeah. That's hard to detect. But x could be marked as "don't decref
>> when not-active"
>
> How would you know that it needs to be marked like that? You won't
> necessarily see it in the code that a pointer was taken from the value, that
> might have happened within a called function.
>
>
>>> I'm fine with deallocating variables that are no longer used after the
>>> user
>>> explicitly assigned None to them (i.e. replace the None assignment by a
>>> simple "DECREF + set to NULL" in that case). I don't think we should be
>>> doing more than that.
>>
>> Hmm. Why should that be NULL if user sets it to None?
>
> Because there is no user visible difference. None will always be available,
> even if the Cython code no longer holds a reference to it. So changing "x =
> None" into "Py_DECREF(x); x=NULL" is just fine, as long as we can make sure
> 'x' is never accessed after this point.
>
>
>> For instance:
>>
>> for i in args:
>>     print i
>>
>> this code will be translated into:
>>
>> PyObject *i = NULL;
>>
>> for (;;)
>> {
>>    tmp = next();
>>    if (!tmp) break;
>>
>>   Pyx_XDECREF(i);
>>   i = tmp;
>>   print(i);
>> }
>>
>> using active variables information this could be translated into:
>>
>> PyObject *i = NULL;
>>
>> for (;;)
>> {
>>    tmp = next();
>>    if (!tmp) break;
>>
>>   i = tmp;
>>   print(i);
>>   Pyx_DECREF(i);
>> }
>
> That's not correct, though. Python semantics dictate that 'i' must keep its
> value until the end of the function or until it's being reassigned to,
> whatever comes first. Remember that objects can have deallocators in Python.
> That must not be called at an undefined point.
>
> The only thing that can safely be special cased is None. It's common in
> Python code to set a variable to None when the value is worth being
> deallocated (e.g. a large data structure). Cython can optimise this as I
> indicated above.
>

Ohh, I see that variable references couldn't be simply removed.

Unused result reference removal seems safe to me:

a = foo() # a will be assigned to NULL here
print

-- 
vitja.


More information about the cython-devel mailing list