[Cython] Gsoc project
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Wed Mar 28 05:05:25 CEST 2012
On 03/27/2012 02:17 PM, Philip Herron wrote:
> Hey
>
> I got linked to your idea
> http://groups.google.com/group/cython-users/browse_thread/thread/cb8aa58083173b97/cac3cf12d438b122?show_docid=cac3cf12d438b122&pli=1
> by David Malcolm on his plugin mailing list.
>
> I am looking to apply to Gsoc once again this year i have done gsoc
> 2010 and 2011 on GCC implementing my own GCC front-end for python
> which is still in very early stages since its a huge task. But i am
> tempted to apply to this project to implement a more self contained
> project to give back to the community more promptly while that hacking
> on my own front-end on my own timer. And i think it would benefit me
> to get to understand in more detail different aspects of python which
> is what i need and would gain very much experience from.
Excellent! After talking to lots of people at PyCon about Cython, it is
obvious that auto-generation of pxd files is *the* most missed feature
in Cython today. If you do this, lots of Cython users will be very grateful.
>
> I was wondering if you could give me some more details on how this
> could all work i am not 100% familiar with cython but i think i
> understand it to a good extend from playing with it for most of my
> evening. I just want to make sure i understand the basic use case of
> this fully, When a user could have something like:
>
> -header foo.h
>
> extern int add (int, int);
>
> -source foo.c
>
> #include "foo.h"
>
> int add (int x, int y)
> {
> return x+y;
> }
>
> We use the plugin to go over the decls created and create a pxd file like:
>
> cdef int add (int a, int b):
> return a + b
>
> Although this is a really basic example i just want to make sure i
> understand whats going on. Maybe some more of you have input? I guess
> this would be best suited as a proposal for Python rather than GCC?
This isn't quite what should be done. Cython generates C code that
includes C header files; what the pxd files are needed for is to provide
declarations for Cython about what is available on the C side (during
the Cython->C translation/compilation).
So: "foo.c" is irrelevant to Cython. And, foo.h should turn into foo.pxd
like this:
cdef extern from "foo.h":
int add(int, int)
Let us know if you have any question; you may want to look at examples
for using Cython to wrap C code, such as
https://github.com/zeromq/pyzmq/blob/master/zmq/core/libzmq.pxd
and the rest of the pyzmq code.
Moving over to the idea of making this a GSoC:
First, we have a policy of requiring patches from prospective students
in addition to their application. Often, this has been to fix a bug or
two in Cython. However, given that pxd generation can be done without
much digging into Cython itself, I think that something like a crude
prototype of the pxd generator (supporting only a subset of C) would be
a better fit (other devs, what do you think?)
The project should contain at least:
- The wrapper generator itself
- Tests for it (including the task of figuring out how to test this,
possibly both unit tests and integration tests)
- A strategy for testing it for all relevant versions of gcc; one
should probably set up Jenkins jobs for it
Even then, I feel that this is rather small for a full GSoC, even when
supporting the subset of C++ supported by Cython, I would estimate a
month or so (and GSoC is two months). So it should be extended in one
direction or another. Some ideas:
- Very often one is not interested in the full header file. One really
wants "the API", not a translation of the C header. This probably
requires a) some heuristics, and b) the possibility for, as easily as
possible, write some selectors/configuration for what should be included
and not. Making that end-user-friendly is perhaps a challenge, I'm not sure.
One idea here is to make possible an interplay where you look at the pyx
file what needs to be wrapped. I.e. you first try to use a function in
the pyx file as if it had already been declared, then run the pxd
generator feeding in the pyx files (and .h files), and out comes the
required pxd file bridging the two (containing only the used subset).
- Support using clang to parse C code in addition
- There's a problem in that an often-used Cython approach is:
1) Generate C file from pyx and pxd files
2) Ship to other computers
3) Compile C file
However, this is fragile when combined with auto-generated pxd files,
because the resulting pxd may be different depending on whether -DFOO is
given to gcc or not.
The above 3 steps are possible because Cython often does not care about
the exact type of something, just basic type and signedness. So if you do
cdef extern from "foo.h":
ctypedef int sometype_t
then sometype_t can actually be a short or a char, and Cython doesn't
care. (Similarly, not all fields of a struct needs to be exposed, only
the ones that form part of the API.)
However, I'm not sure if the quality of an auto-generated pxd file is
good enough for this approach.
So either a) the wrapper generator and Cython must be plugged into the
typical setup.py build, or b) one figures out something clever (or,
likely, more than one clever thing) which allows to continue using the
above workflow.
Either a) and b), or both, could be part of the project. a) essentially
requires looking at Cython.Distutils. For b), it *may* involve hooking
into gcc *before* the preprocessor is run and take into account #ifdef
etc, if that is even possible, and new features in Cython for specifying
in a pxd file that "there's an #ifdef here", and see if that can somehow
result in intelligently generated C code.
PS. I should stress that a pxd generator is *very* useful -- because it
would do 90% of the job, and even if humans need to do the last 10% it
is still a major timesaver.
- More straightforward than the above: Parse Fortran through the
gfortran GCC frontend. The Fwrap program
(https://github.com/fwrap/fwrap) has been dormant in terms of
development past couple of years, but is still the most promising way of
bringing Fortran and Cython together.
Part of Fwrap's problem is the existing parser. Changing to using the
gfortran as the parser would be spectacular, and probably revive the
project. It has a solid test suite, so one would basically replace the
parser component of Fwrap, make sure the test suite passes, and that
would be it.
(Of course, few people outside the scientific community cares anything
about Fortran.)
Those are some ideas. Remember: This is *your* project, so make sure you
focus on features you'd find fun to play with and implement. And do NOT
take all of the above, that's way too much :-), just find one or two
extra features that help make the GSoC application really appealing.
Dag
More information about the cython-devel
mailing list