[pypy-dev] What can Cython do for PyPy?

Thu Aug 12 17:35:40 CEST 2010

I agree with the motivations given by Stefan - two interesting
possibilities would be to:
a) first, test the compatibility layer with Cython generated code

b) possibly, allow users to use the Python API while replacing
refcounting with another, more meaningful, PyPy-specific API* for a
garbage collected heap.

However, such an API is radically different. I'm also not sure how
well such an API would mesh with the CPython API, actually. If Cython
could support such an API, that would be great. But I'm unsure whether
this is worth it, for Cython, and more in general for other modules
(one could easily and elegantly support both CPython and PyPy with
preprocessor tricks).

See further below about why call overhead is not the biggest
performance problem when not inlining.

* I thought the Java Native Interface (JNI) design of local and global
references (http://download.oracle.com/javase/6/docs/technotes/guides/jni/spec/design.html#wp16785)
would work here, with some adaptation.
However, if your moving GCs support pinning of objects, as I expect to
be necessary to interact with CPython code, I would do an important
change to that API: instead of having object references be pointers to
(movable by the GC) pointers to objects, like in the JNI API, PyPy
should use plain pinned pointers. The pinning would not be apparent in
the type, but that should be fine I guess.
Problems arise when PyPy-aware code calls code which still uses the
refcounting API. It is mostly safe to ignore the refcounting (even
decreases) for local references, but I'm unsure about persistent
references, even if it's probably still the best solution, so that the
PyPy-aware code handles the lifecycle by itself.

On Thu, Aug 12, 2010 at 11:25, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Maciej Fijalkowski, 12.08.2010 10:05:
>> On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:

> If you only use it to call into non-trivial Cython code (e.g. some heavy
> calculations on NumPy tables), the call overhead should be mostly
> negligible, maybe even close to that in CPython. You could even provide
> some kind of fast-path to 'cpdef' functions (i.e. functions that are
> callable from both C and Python) and 'api' functions (which are currently
> exported at the module API level using the PyCapsule mechanism). That would
> reduce the call overhead to that of a C call.

>> but it's also unjitable. This means that to JIT, cpython
>> extension is like a black box which should not be touched.

> Well, unless both sides learn about each other, that is. It won't
> necessarily impact the JIT, but then again, a JIT usually won't have a
> noticeable impact on the performance of Cython code anyway.

Call overhead is not the biggest problem, I guess (well, if it's
bigger than that in C, it might be); it's IMHO the minor problem when
you can't inline. Inlining is important because it allows to do more
optimizations on the combined code. Now, it might or might not apply
to your typical use cases (present and future), you should just keep
this issue in mind, too. Whenever you say "If you only use it to call
into non-trivial Cython code", you imply that some kind of functional
abstraction, the one where you write short functions, such as
accessors, are not efficiently supported.

For instance, if you call two functions, each containing a parallel
for loops, fusing the loops requires inlining the functions to expose
the loops.
Inlining accessors (getters and setters) allows to recognize that they
often don't need to be called over and over again, i.e., common
subexpression elimination, which you can't do on a normal (impure)
function.

To make a particularly dramatic example (since it comes from C) of a
quadratic-to-linear optimization: a loop like
for (i = 0; i < strlen(s); i++) {
  //do something on s without modifying it
}

takes quadratic time, because strlen takes linear time and is called
at each loop. Can the optimizer fix this? The simplest way for it is
to inline everything, then it could notice that calculating strlen
only once is safe. In C with GCC extensions, one could annotate strlen
as pure, and use functions which take s as a const parameter (but I'm
unsure if it actually works). In Python (and even in Java), anything
such should work without annotations.

Of course, one can't rely on this quadratic-linear optimization unless
it's guaranteed to work (like tail call elimination), so I wouldn't do
it in this case; this point relates to the wider issue of unreliable
optimizations and "sufficiently smart compilers", better discussed at
http://prog21.dadgum.com/40.html (not mine).
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/