[pypy-dev] numpypy array expressions

mark florisson markflorisson88 at gmail.com
Mon Aug 27 15:09:51 CEST 2012


Hey,

For this year's summer of code, and for my master dissertation, we
created a project to compile array expressions efficiently (Dag was my
mentor, CCed), which can be found here:
https://github.com/markflorisson88/minivect , the thesis is under
subdirectory 'thesis'. It's currently integrated in Cython, and can
significantly outperform open source and commercial Fortran compilers
for certain benchmarks (see the bench/graphs subdirectory on github,
or the dissertation). For other benchmarks, it's mostly close,
accounting for some runtime overhead.

The purpose of the project is to be reusable, it's used by Cython,
it's likely going to be integrated into Theano, it's partially
integrated into numba at the moment. It can generate C code for static
compilers, and uses llvmpy to do just-in-time specialization for
runtime compilers (likely combined with some form of lazy evaluation).
However, I think pypy has its own approach, and I think it relies on
the JIT to evaluate the expressions (feel free to point me to
documentation or source code)? But I don't imagine it optimizes for
the cache hierarchy?

So the question is, would there be an interest to use this project in
PyPy? It's currently meant to be used as a git submodule, or to be
included verbatim, but I also intend to make it distributable and
installable. The project is designed so that adding new code
generators should require minimal effort. But in the event this type
of code translation is not compatible with pypy's approach, you could
have a look at the techniques for optimizations, such as tiling for
spatial locality and several of the optimizations described in the
thesis and prototyped in code on github, such as SIMD transposes when
mixing C and Fortran contiguous arrays, a tractable way to prove data
independence for NumPy arrays to avoid array temporaries, optimizing
broadcasting through loop-invariant code motion, and so forth.

Anyway, it doesn't have all functionality yet, for instance it doesn't
support reductions yet. But let us know if pypy would want to use it,
and if so, how we could collaborate and make minivect compatible for
the purposes of pypy.

Mark


More information about the pypy-dev mailing list