[pypy-commit] pypy vecopt: add a new doc file to describe vecopt, describe what the current opt. is capable of and added some limitations

Tue Jun 23 09:33:35 CEST 2015

Author: Richard Plangger <rich at pasra.at>
Branch: vecopt
Changeset: r78250:77f58b1dcf7f
Date: 2015-06-23 09:33 +0200
http://bitbucket.org/pypy/pypy/changeset/77f58b1dcf7f/

Log:	add a new doc file to describe vecopt, describe what the current
	opt. is capable of and added some limitations

diff --git a/rpython/doc/jit/vectorization.rst b/rpython/doc/jit/vectorization.rst
new file mode 100644
--- /dev/null
+++ b/rpython/doc/jit/vectorization.rst
@@ -0,0 +1,45 @@
+
+Vectorization
+=============
+
+TBA
+
+Features
+--------
+
+Currently the following operations can be vectorized if the trace contains parallelism:
+
+* float32/float64: add, substract, multiply, divide, negate, absolute
+* int8/int16/int32/int64 arithmetic: add, substract, multiply, negate, absolute
+* int8/int16/int32/int64 logical: and, or, xor
+
+Reduction is implemented:
+
+* sum
+
+Planned reductions:
+
+* all, any, prod, min, max
+
+To find parallel instructions the tracer must provide enough information about
+memory load/store operations. They must be adjacent in memory. The requirement for
+that is that they use the same index variable and offset can be expressed as a
+a linear or affine combination.
+
+Unrolled guards are strengthend on a arithmetical level (See GuardStrengthenOpt).
+The resulting vector trace will only have one guard that checks the index.
+
+Calculations on the index variable that are redundant (because of the merged
+load/store instructions) are not removed. The backend removes these instructions
+while assembling the trace.
+
+
+Future Work and Limitations
+---------------------------
+
+* The only SIMD instruction architecture currently supported is SSE4.1
+* Loop that convert types from int(8|16|32|64) to int(8|16) are not supported in
+  the current SSE4.1 assembler implementation.
+  The opcode needed spans over multiple instructions. In terms of performance
+  there might only be little to non advantage to use SIMD instructions for this
+  conversions.
diff --git a/rpython/jit/metainterp/optimizeopt/vectorize.py b/rpython/jit/metainterp/optimizeopt/vectorize.py
--- a/rpython/jit/metainterp/optimizeopt/vectorize.py
+++ b/rpython/jit/metainterp/optimizeopt/vectorize.py
@@ -1,3 +1,10 @@
+"""
+This is the core of the vec. optimization. It combines dependency.py and schedule.py
+to rewrite a loop in vectorized form.
+
+See the rpython doc for more high level details.
+"""
+
 import py
 
 from rpython.jit.metainterp.resume import Snapshot