[Cython] Utilities, cython.h, libcython

Wed Oct 5 15:53:24 CEST 2011

On 5 October 2011 08:16, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 04.10.2011 23:19:
>>
>> So I propose that after fused types gets merged we try to move as many
>> utility codes as possible to their utility code files (unless they are
>> used in pending pull requests or other branches). Preferably this will
>> be done in one or a few commits. How should we split up the work
>
> I would propose that new utility code gets moved out into utility files
> right away (if doable, given the current state of the infrastructure), and
> that existing utility code gets moves when it gets modified or when someone
> feels like it. Until we really get to the point of wanting to create a
> separate shared library etc., there's no need to hurry with the move.
>
>
>> We could actually move things before fused types get merged, as long
>> as we don't touch binding_cfunc_utility_code.
>
> Another reason not to hurry, right?
>
>
>> Before we go there, Stefan, do we still want to implement the header
>> .ini style which can list dependencies and such?
>
> I think we'll eventually need that, but that also depends a bit on the
> question whether we want to (or can) build a shared library or not. See
> below.
>
>
>> Another issue is that Cython compile time is increasing with the
>> addition of control flow and cython utilities. If you use fused types
>> you're also going to combinatorially add more compile time.
>
> I don't see that locally - a compiled Cython is hugely fast for me. In
> comparison, the C compiler literally takes ages to compile the result. An
> external shared library may or may not help with both - in particular, it is
> not clear to me what makes the C compiler slow. If the compile time is
> dominated by the number of inlined functions (which is not unlikely), a
> shared library + header file will not make a difference.
>

Have you tried with the memoryviews merged? e.g. if I have this code:

from libc.stdlib cimport malloc
cdef int[:] slice = <int[:10]> <int *> malloc(sizeof(int) * 10)

[0] [14:45] ~  ➤ time cython test.pyx
cython test.pyx  2.61s user 0.08s system 99% cpu 2.695 total
[0] [14:45] ~  ➤ time zsh compile
zsh compile  1.88s user 0.06s system 99% cpu 1.946 total

where 'compile' is the script that invoked the same gcc command
distutils uses. As you can see it took more than 2.5 seconds to
compile this code (simply because the memoryview utilities get
included). The C compiler does it quite a lot faster here. This
obviously depends largely on your code, you get probably have it the
other way around as well.

>> I'm sure
>> this came up earlier, but I really think we should have a libcython
>> and a cython.h. libcython (a shared library) should contain any common
>> Cython-specific code not meant to be inlined, and cython.h any types,
>> macros and inline functions etc.
>
> This has a couple of implications though. In order to support this on the
> user side, we have to build one shared library per installed package in
> order to avoid any Cython versioning issues. Just installing a versioned
> "libcython_x.y.z.so" globally isn't enough, especially during development,
> but also at deployment time. Different packages may use different CFLAGS or
> Cython options, which may have an impact on the result. Encoding all
> possible factors in the file name will be cumbersome and may mean that we
> still end up with a number of installed Cython libraries that correlates
> with the number of installed Cython based packages.

Hm, I think the CFLAGS are important so long as they are compatible
with Python. When the user compiles a Cython extension module with
extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
utilities are really not the user's responsibility, so libcython
doesn't need to be compiled with the same flags as the extension
module. If still wanted, the user could either recompile python with
different CFLAGS (which means libcython will get those as well), or
not use libcython at all. CFLAGS should really only pertain to user
code, not to the Cython library, which the user shouldn't be concerned
about.

> Next, we may not know at build time which set of Cython modules is in the
> package. This may be less of an issue if we rely on "cythonize()" in
> setup.py to compile all modules before hand (assuming that the user doesn't
> call it twice, once for *.pyx, once for *.py, for example), but even if we
> know all modules, we'd still have to figure out the complete set of utility
> code used by all modules in order to build an adapted library with only the
> necessary code used in the package. So we'd always end up with a complete
> library with all utility code, which is only really interesting for larger
> packages with several Cython modules.
> I agree with Robert that a CEP would be needed for this, both for clearing
> up the implications and actual use cases (I know that Sage is a reasonable
> use case, but it's also a rather special case).
>
>
>> This will decrease Cython and C
>> compile time, and will also make executables smaller.
>
> I don't see how this actually impacts executables. However, a self-contained
> executable is a value in itself.
>
>
>> This could be
>> enabled using a command line option to Cython, as well as with
>> distutils, eventually we may decide to make it the default (lets
>> figure that out later). Preferably libcython.so would be installed
>> alongside libpython.so and cython.h inside the Python include
>> directory.
>
> I don't see this happening. It's easy for Python (there is only one Python
> running at a time, with one libpython loaded), but it's a lot less safe for
> different versions of a Cython library that are used by different modules
> inside of the running Python. For example, we'd have to version all visible
> symbols in operating systems with flat namespaces, in order to support
> loading multiple versions of the library.
>
>
>> Lastly, I think we also should figure out a way to serialize Entry
>> objects from CythonUtilities, which could easily and swiftly be loaded
>> when creating the cython scope. It's quite a pain to declare all
>> entries for utilities you write manually
>
> Why would you declare them manually? I thought everything would be moved out
> into the utility code files?
>

Right, the code is in the utility files. However, the cython scope
needs to have the entries of the classes and functions of the
utilities. e.g. the user may write

cimport cython

cdef cython.array myobject

For this to work, we need an 'array' entry, which we don't have yet,
as the utility code will be parsed at code generation time if an entry
of that utility code (which doesn't exist yet!) is used.

>> so what I mostly did was
>> parse the utility up to and including AnalyseDeclarationsTransform,
>> and then retrieve the entries from there.
>
> Sounds like a drawback regarding the processing time, but may still be a
> reasonable way to do it. I would expect that it won't be hard to pickle the
> resulting dict of entries into a cache file and rebuild it only when one of
> the utility files changes.

Exactly. I'm not sure about pickle though, but the details don't
matter. Pickle is certainly easy as long as you don't change your
interface (which we most certainly will, though).

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>