[Cython] [cython-users] freelist benchmarks

Stefan Behnel stefan_ml at behnel.de
Wed Feb 27 08:24:55 CET 2013


Robert Bradshaw, 26.02.2013 21:16:
> On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote:
>> David Roe, 25.02.2013 00:00:
>>> I changed the current type pointer check to look at tp_basicsize instead.
>>>
>>>> That made it work for almost all classes in lxml's own Element hierarchy,
>>>> with only a couple of exceptions in lxml.objectify that have one additional
>>>> object field. So, just extending the freelist support to use two different
>>>> lists for different struct sizes instead of just one would make it work for
>>>> all of lxml already. Taking a look at Sage to see how the situation appears
>>>> over there would be interesting, I guess.
>>>
>>> I found some chains of length 5.  This could be shortened to 4 by putting
>>> the freelist at the level of Element (which is where you most care about
>>> speed of object creation).
>>
>> It's substantially easier to keep it in the top-level base class, though.
>> Otherwise, we'd need a new protocol between inheriting types as I
>> previously described. That add a *lot* of complexity.
>>
>>
>>> SageObject
>>>     -> Element (_parent attribute and cdef methods)
>>>     -> Vector (_degree)
>>>     -> FreeModuleElement (_is_mutable)
>>>     -> FreeModuleElement_generic_dense (_entries)
>>>
>>> SageObject
>>>     -> Element (_parent attribute and cdef methods)
>>>     ->sage.structure.element.Matrix (_nrows)
>>>     -> sage.matrix.matrix.Matrix (_base_ring)
>>>     -> Matrix_integer_dense (_entries)
> 
> I don't know that (expensive) matrices are the best example, and often
> the chains are larger for elements one really cares about.
> 
> sage: def base_tree(x): return [] if x is None else [x] +
> base_tree(x.__base__)
>    ...:
> 
> sage: base_tree(Integer)
>  [<type 'sage.rings.integer.Integer'>, <type
> 'sage.structure.element.EuclideanDomainElement'>, <type
> 'sage.structure.element.PrincipalIdealDomainElement'>, <type
> 'sage.structure.element.DedekindDomainElement'>, <type
> 'sage.structure.element.IntegralDomainElement'>, <type
> 'sage.structure.element.CommutativeRingElement'>, <type
> 'sage.structure.element.RingElement'>, <type
> 'sage.structure.element.ModuleElement'>, <type
> 'sage.structure.element.Element'>, <type
> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
> 
> sage: base_tree(RealDoubleElement)
>  [<type 'sage.rings.real_double.RealDoubleElement'>, <type
> 'sage.structure.element.FieldElement'>, <type
> 'sage.structure.element.CommutativeRingElement'>, <type
> 'sage.structure.element.RingElement'>, <type
> 'sage.structure.element.ModuleElement'>, <type
> 'sage.structure.element.Element'>, <type
> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
> 
> sage: base_tree(type(mod(1, 10)))
>  [<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>, <type
> 'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, <type
> 'sage.rings.finite_rings.element_base.FiniteRingElement'>, <type
> 'sage.structure.element.CommutativeRingElement'>, <type
> 'sage.structure.element.RingElement'>, <type
> 'sage.structure.element.ModuleElement'>, <type
> 'sage.structure.element.Element'>, <type
> 'sage.structure.sage_object.SageObject'>, <type 'object'>]

My original question was if they have differently sized object structs or
not. Those that don't would currently go into the same freelist.


>> Ok, so even for something as large as Sage, we'd apparently end up with
>> just a couple of freelists for a given base type. That really makes it
>> appear reasonable to make that number a compile time constant as well. I
>> mean, even if you *really* oversize it, all you loose is the static memory
>> for a couple of pointers. On a 64 bit system, if you use a freelist size of
>> 8 objects and provision freelists for 8 differently sized subtypes, that's
>> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
>> times that size shouldn't hurt anyone. Unused subtype freelists really take
>> almost no space and won't hurt performance either.
> 
> Elements in Sage are typically larger than 8 bytes

I wasn't adding up the size of the objects, only of the pointers in the
freelists. If the objects end up in the freelist, they'll also be used on
the next instantiation, so their size doesn't really matter.


> and our
> experiments for Integer showed that the benefit (for this class)
> extended well beyond 8 items. On the other hand lots of elements are
> so expensive that they don't merit this at all.
>
> I think one thing to keep in mind is that Python's heap is essentially
> a "freelist" of objects of every size up to 128(?) bytes, so what are
> we trying to save by putting it at the base type and going up and down
> the __cinit__/__dealloc__ chain?

Allocation still seems to take its time.


> I suppose we save the zero-ing out of
> memory and a function call or two, but that's not the expensive part.

And I noticed now that we still have to do the zeroing out in order to
initialise C typed attributes. And that it's actually not trivial to figure
out in what cases we can safely put a subtype into the freelist. There are
a couple of special cases in CPython's object allocation, e.g. for heap
types. Their instances own a reference to the type, which is not the case
for static types.


> For our Integer free list, we save going up the __cinit__/__dealloc__
> call, initializing a couple of members, setting the vtable pointers,
> which turns out to be the bulk of the cost.

And your hierarchy examples above show that that they are implemented
across multiple modules. I can imagine that being a major problem as the C
compiler can't inline the tp_new calls in that case, can't really reorder
the struct field assignments, etc.

I imagine that the freelist could leave the initial vtable untouched in
some cases, but that would mean that we need a freelist per actual type,
instead of object struct size.

Now, if we move the freelist handling into each subtype (as you and Mark
proposed already), we'd get some of this for free, because the objects that
get freed are already properly set up for the specific type, including
vtable etc. All that remains to be done is to zero out the (known) C typed
attributes, set the (known) object attributes to None, and call any
__cinit__() methods in the super types to do the rest for us. We might have
to do it in the right order, i.e. initialise some attributes, call the
corresponding __cinit__() method, initialise some more attributes, ...

So, basically, we'd manually inline the bottom-up aggregation of all tp_new
functions into the current one, skipping those operations that we don't
consider necessary in the freelist case, such as the vtable setup.

Now, the only remaining issue is how to get at the __cinit__() functions if
the base type isn't in the same module, but as Mark proposed, that could
still be done if we require it to be exported in a C-API (and assume that
it doesn't exist if not?). Would be better to know it at compile time,
though...

Stefan



More information about the cython-devel mailing list