[Cython] freelist benchmarks

Stefan Behnel stefan_ml at behnel.de
Wed Feb 27 14:09:16 CET 2013


Robert Bradshaw, 27.02.2013 09:54:
> On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel wrote:
>> I imagine that the freelist could leave the initial vtable untouched in
>> some cases, but that would mean that we need a freelist per actual type,
>> instead of object struct size.
>>
>> Now, if we move the freelist handling into each subtype (as you and Mark
>> proposed already), we'd get some of this for free, because the objects that
>> get freed are already properly set up for the specific type, including
>> vtable etc. All that remains to be done is to zero out the (known) C typed
>> attributes, set the (known) object attributes to None, and call any
>> __cinit__() methods in the super types to do the rest for us. We might have
>> to do it in the right order, i.e. initialise some attributes, call the
>> corresponding __cinit__() method, initialise some more attributes, ...
>>
>> So, basically, we'd manually inline the bottom-up aggregation of all tp_new
>> functions into the current one, skipping those operations that we don't
>> consider necessary in the freelist case, such as the vtable setup.
>>
>> Now, the only remaining issue is how to get at the __cinit__() functions if
>> the base type isn't in the same module, but as Mark proposed, that could
>> still be done if we require it to be exported in a C-API (and assume that
>> it doesn't exist if not?). Would be better to know it at compile time,
>> though...
> 
> Yes, and that's still going to (potentially) be expensive. I'd rather
> have a way of controlling what, if anything, gets zero'd out/set to
> None, as most of that (in Sage's case at least) will still be valid
> for the newly-reused type or instantly over-written (though perhaps
> the default could be to call __dealloc__/__cinit__). With this we
> could skip going up and down the type hierarchy at all.

I don't think the zeroing is a problem. Just bursting out static data to
memory should be plenty fast these days and not incur any wait cycles or
pipeline stalls, as long as the compiler/processor can figure out that
there are no interdependencies between the assignments. The None
assignments may be a problem due to the INCREFs, but even in that case, the
C compiler and processor should be able to detect that they are all just
incrementing the same address in memory and may end up reducing a series of
updates into one. The only real problem are the calls to __cinit__(), which
run user code and can thus do anything. If they can't be inlined, the C
compiler needs to lessen a lot of its assumptions.

Would it make sense to require users to implement __cinit__() as an inline
method in a .pxd file if they want to use a freelist on a subtype? Or would
that be overly restrictive? It would prevent them from using module
globals, for example. That's quite a restriction normally, but I'm not sure
how much it hurts the "average" code in the specific case of __cinit__().

Stefan



More information about the cython-devel mailing list