[Cython] Shared Cython runtime (was: Upcoming cython/numpy breakage with stride checking)

Stefan Behnel stefan_ml at behnel.de
Tue Apr 9 21:28:11 CEST 2013


Robert Bradshaw, 09.04.2013 21:00:
> On Tue, Apr 9, 2013 at 7:22 AM, Nathaniel Smith wrote:
>> On Tue, Apr 9, 2013 at 3:15 PM, Stefan Behnel wrote:
>>> Ok, got it now. That solves the distribution problem, assuming that all
>>> installed runtimes with a given version are equivalent. Basically, we'd
>>> move the code out and then cimport the stuff back that we need, which would
>>> then let the first import of a given runtime version insert it into
>>> sys.modules.
>>
>> Yeah, that's the definition of "given version" :-), and any kind of
>> shared runtime does require versioning. If we wanted to be extra
>> careful we could put the shared code into its own block of boilerplate
>> that gets injected into each generated .c file, and then have the
>> "version" be the sha1 of that block of boilerplate... the trade-offs
>> depend on what exactly is getting shared.
> 
> I actually thought a lot about this way back, and one of the tricky
> bits is that identical C code can have entirely different (and
> incompatible) meanings depending on typedefs, macros etc. We'd have to
> use the preparsed c. Also, the runtime should be the union of utility
> code (some of it dynamically generated) which is different for each
> individual module.
> 
> That being said, there's probably a huge chunk of common boilerplate
> (generators, bound functions, memory views) that could get shared
> without too much effort. We could also share .h files for much of this
> (which would cut down on disk useage and compilation time). Also, as a
> first pass, we could have a single, dynamically generated (and
> hash-named) runtime file for a single call to cythonize(...) that
> would allow much sharing withing a project without worrying about
> cross-project versioning (and it'd be easier logistically to create
> this as a new, external module rather than package it up in each .so
> file).

Right, and that gives us two safe options already: users can build a single
meta-module from multiple Cython modules, or they can build one shared
Cython runtime library for an entire PyPI package. In both cases, no
versioning is required, as the complete thing would always be expected to
be built (and installed) in one go.


> I am curious what percentage of the final .so file is boilerplate. A
> huge chunk of the .c code often is, but that consists largely of
> specialize macros and inline functions that (ideally) optimize away to
> nearly nothing.

I think the advantage lies more in the ability to use exactly the same type
for functions, generators and memory views across a larger code base,
rather than having a separate one for each individual module.

I don't think the average utility function would make any difference,
especially because most of them are utility functions exactly because we
want to inline them. There's a possible exception for really high-level
code like import, print, exec or things like that, but I don't think that
weighs in to any real amount, even when you sum it up.

So, sharing types is good, sharing functions is IMHO mostly useless.

Stefan



More information about the cython-devel mailing list