[Cython] Utilities, cython.h, libcython

Wed Oct 5 15:54:02 CEST 2011

On 5 October 2011 08:38, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 04.10.2011 23:19:
>>>
>>> So I propose that after fused types gets merged we try to move as many
>>> utility codes as possible to their utility code files (unless they are
>>> used in pending pull requests or other branches). Preferably this will
>>> be done in one or a few commits. How should we split up the work
>>
>> I would propose that new utility code gets moved out into utility files
>> right away (if doable, given the current state of the infrastructure), and
>> that existing utility code gets moves when it gets modified or when someone
>> feels like it. Until we really get to the point of wanting to create a
>> separate shared library etc., there's no need to hurry with the move.
>>
>>
>>> We could actually move things before fused types get merged, as long
>>> as we don't touch binding_cfunc_utility_code.
>>
>> Another reason not to hurry, right?
>>
>>
>>> Before we go there, Stefan, do we still want to implement the header
>>> .ini style which can list dependencies and such?
>>
>> I think we'll eventually need that, but that also depends a bit on the
>> question whether we want to (or can) build a shared library or not. See
>> below.
>>
>>
>>> Another issue is that Cython compile time is increasing with the
>>> addition of control flow and cython utilities. If you use fused types
>>> you're also going to combinatorially add more compile time.
>>
>> I don't see that locally - a compiled Cython is hugely fast for me. In
>> comparison, the C compiler literally takes ages to compile the result. An
>> external shared library may or may not help with both - in particular, it is
>> not clear to me what makes the C compiler slow. If the compile time is
>> dominated by the number of inlined functions (which is not unlikely), a
>> shared library + header file will not make a difference.
>>
>>
>>> I'm sure
>>> this came up earlier, but I really think we should have a libcython
>>> and a cython.h. libcython (a shared library) should contain any common
>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>> macros and inline functions etc.
>>
>> This has a couple of implications though. In order to support this on the
>> user side, we have to build one shared library per installed package in
>> order to avoid any Cython versioning issues. Just installing a versioned
>> "libcython_x.y.z.so" globally isn't enough, especially during development,
>> but also at deployment time. Different packages may use different CFLAGS or
>> Cython options, which may have an impact on the result. Encoding all
>> possible factors in the file name will be cumbersome and may mean that we
>> still end up with a number of installed Cython libraries that correlates
>> with the number of installed Cython based packages.
>
> That's a good point. Perhaps an easier first target is to have one
> "libcython" per package (with a randomized or project-specific name).
> Longer-term, I think the goal of one libcython per version is a
> reasonable one, for deployment at least. Exceptional packages (e.g.
> that require a special set of CFLAGS rather than the ones Python was
> built with) can either bundle their own or forgo any sharing of code
> as it is done now, and features that can't be easily normalized across
> (cython and c) compilation options would remain in project-specific
> generated .c files.
>
>> Next, we may not know at build time which set of Cython modules is in the
>> package. This may be less of an issue if we rely on "cythonize()" in
>> setup.py to compile all modules before hand (assuming that the user doesn't
>> call it twice, once for *.pyx, once for *.py, for example), but even if we
>> know all modules, we'd still have to figure out the complete set of utility
>> code used by all modules in order to build an adapted library with only the
>> necessary code used in the package. So we'd always end up with a complete
>> library with all utility code, which is only really interesting for larger
>> packages with several Cython modules.
>
> Yes, I'm thinking we would create relatively complete libraries,
> though if we did things on a per package level perhaps we could do
> some pruning. We could still conditionally put some of the utility
> code (especially the rarely used or shared stuff) into each module.

Yeah that would be nice. I actually think we shouldn't do anything on
a per-package level, only a bunch of modules with related stuff
(conversion utilities/exception raising etc in one module,
buffer/memoryview utilities in another etc). We've been living with
huge files since now, I don't think we suddenly need to actively start
pruning for a little bit of memory.

I think the module approach would also be easy to implement, as the
infrastructure for external cdef functions/classes importing/exporting
is already there.

>> I agree with Robert that a CEP would be needed for this, both for clearing
>> up the implications and actual use cases (I know that Sage is a reasonable
>> use case, but it's also a rather special case).
>>
>>
>>> This will decrease Cython and C
>>> compile time, and will also make executables smaller.
>>
>> I don't see how this actually impacts executables. However, a self-contained
>> executable is a value in itself.
>
> As an example, we're starting to have full utility types, e.g. for
> generators and or CyFunction. Lots of the utility code (e.g. loading
> modules, raising exceptions, etc.) could be shared as well. For
> something like Sage that could be a significant savings, and it could
> be a big boon for cython.inline as well.
>
>>> This could be
>>> enabled using a command line option to Cython, as well as with
>>> distutils, eventually we may decide to make it the default (lets
>>> figure that out later). Preferably libcython.so would be installed
>>> alongside libpython.so and cython.h inside the Python include
>>> directory.
>>
>> I don't see this happening. It's easy for Python (there is only one Python
>> running at a time, with one libpython loaded), but it's a lot less safe for
>> different versions of a Cython library that are used by different modules
>> inside of the running Python. For example, we'd have to version all visible
>> symbols in operating systems with flat namespaces, in order to support
>> loading multiple versions of the library.
>
> Which is another advantage to "linking" via the cimport mechanisms.
>
>>> Lastly, I think we also should figure out a way to serialize Entry
>>> objects from CythonUtilities, which could easily and swiftly be loaded
>>> when creating the cython scope. It's quite a pain to declare all
>>> entries for utilities you write manually
>>
>> Why would you declare them manually? I thought everything would be moved out
>> into the utility code files?
>>
>>
>>> so what I mostly did was
>>> parse the utility up to and including AnalyseDeclarationsTransform,
>>> and then retrieve the entries from there.
>>
>> Sounds like a drawback regarding the processing time, but may still be a
>> reasonable way to do it. I would expect that it won't be hard to pickle the
>> resulting dict of entries into a cache file and rebuild it only when one of
>> the utility files changes.
>
> +1
>
> It'd be great to be able to do this for the many .pxd files in Sage as
> well. Parsing .pxd files is a huge portion of the compilation of the
> Sage library.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>