[Numpy-discussion] Fwd: Numexpr-3.0 proposal

Tue Feb 16 04:04:17 EST 2016

On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
gregor.thalhammer at gmail.com> wrote:

>
> Dear Robert,
>
> thanks for your effort on improving numexpr. Indeed, vectorized math
> libraries (VML) can give a large boost in performance (~5x), except for a
> couple of basic operations (add, mul, div), which current compilers are
> able to vectorize automatically. With recent gcc even more functions are
> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
> special flags depending on the platform (SSE, AVX present?), runtime
> detection of processor capabilities would be nice for distributing
> binaries. Some time ago, since I lost access to Intels MKL, I patched
> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
>
> As you increased the opcode size, I could imagine providing a bit to
> switch (during runtime) between internal functions and vectorized ones,
> that would be handy for tests and benchmarks.
>

Dear Gregor,

Your suggestion to separate the opcode signature from the library used to
execute it is very clever. Based on your suggestion, I think that the
natural evolution of the opcodes is to specify them by function signature
and library, using a two-level dict, i.e.

numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3

I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If I
do it the way you suggested funccodes.hpp and all the many #define's
regarding function codes in the interpreter can hopefully be removed and
hence simplify the overall codebase. One could potentially take it a step
further and plan (optimize) each expression, similar to what FFTW does with
regards to matrix shape. That is, the basic way to control the library
would be with a singleton library argument, i.e.:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )

However, we could also permit a tuple to be passed in, where each element
of the tuple reflects the library to use for each operation in the AST tree:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu) )

In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
done by the Python side, and this tuple could be potentially optimized by
numexpr rather than hand-optimized, by trying various permutations of the
linked C math libraries. The wisdom from the planning could be pickled and
saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
there's no reason this can't be pickled and saved to disk. I've done a
similar thing by creating wrappers for PyFFTW already.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcleod at unibas.ch
robert.mcleod at bsse.ethz.ch <robert.mcleod at ethz.ch>
robbmcleod at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160216/816126ef/attachment.html>