[pypy-commit] pypy vecopt-merge: documentation additions (command line flags), added description of the ABC optimization, note on limitations

plan_rich noreply at buildbot.pypy.org
Fri Aug 21 15:43:18 CEST 2015


Author: Richard Plangger <rich at pasra.at>
Branch: vecopt-merge
Changeset: r79117:3c01987639c7
Date: 2015-08-21 15:43 +0200
http://bitbucket.org/pypy/pypy/changeset/3c01987639c7/

Log:	documentation additions (command line flags), added description of
	the ABC optimization, note on limitations

diff --git a/rpython/doc/jit/index.rst b/rpython/doc/jit/index.rst
--- a/rpython/doc/jit/index.rst
+++ b/rpython/doc/jit/index.rst
@@ -25,6 +25,7 @@
    pyjitpl5
    optimizer
    virtualizable
+   vectorization
 
 - :doc:`Overview <overview>`: motivating our approach
 
diff --git a/rpython/doc/jit/optimizer.rst b/rpython/doc/jit/optimizer.rst
--- a/rpython/doc/jit/optimizer.rst
+++ b/rpython/doc/jit/optimizer.rst
@@ -178,6 +178,10 @@
 It is prepended to all optimizations and thus extends the Optimizer class
 and unrolls the loop once before it proceeds.
 
+Vectorization
+-------------
+
+- :doc:`Vectorization <vectorization>`
 
 What is missing from this document
 ----------------------------------
diff --git a/rpython/doc/jit/vectorization.rst b/rpython/doc/jit/vectorization.rst
--- a/rpython/doc/jit/vectorization.rst
+++ b/rpython/doc/jit/vectorization.rst
@@ -7,6 +7,18 @@
 that is that they use the same index variable and offset can be expressed as a
 a linear or affine combination.
 
+Command line flags:
+
+* --jit vec=1: turns on the vectorization for marked jitdrivers
+  (e.g. those in the NumPyPy module).
+* --jit vec_all=1: turns on the vectorization for any jit driver. See parameters for
+  the filtering heuristics of traces.
+* --jit vec_ratio=2: A number from 0 to 10 that represents a real number (vec_ratio / 10).
+  This filters traces if vec_all is enabled. N is the trace count then the number of
+  vector transformable operations (add_int -> vec_add_int) M, the following must hold:
+  M / N >= (vec_ratio / 10)
+* --jit vec_length=60: The maximum number of trace instructions the vectorizer filters for.
+
 Features
 --------
 
@@ -38,6 +50,28 @@
 load/store instructions) are not removed. The backend removes these instructions
 while assembling the trace.
 
+In addition a simple heuristic (enabled by --jit vec_all=1) tries to remove
+array bound checks for application level loops. It tries to identify the array
+bound checks and adds a transitive guard at the top of the loop::
+
+    label(...)
+    ...
+    guard(i < n) # index guard
+    ...
+    guard(i < len(a))
+    a = load(..., i, ...)
+    ...
+    jump(...)
+    # becomes
+    guard(n < len(a))
+    label(...)
+    guard(i < n) # index guard
+    ...
+    a = load(..., i, ...)
+    ...
+    jump(...)
+
+
 
 Future Work and Limitations
 ---------------------------
@@ -54,5 +88,9 @@
   to have 2 xmm registers (one filled with zero bits and the other with one every bit).
   This cuts down 2 instructions for guard checking, trading for higher register pressure.
 * prod, sum are only supported by 64 bit data types
+* isomorphic function prevents the following cases for combination into a pair:
+  1) getarrayitem_gc, getarrayitem_gc_pure
+  2) int_add(v,1), int_sub(v,-1)
 
 .. _PMUL: http://stackoverflow.com/questions/8866973/can-long-integer-routines-benefit-from-sse/8867025#8867025
+


More information about the pypy-commit mailing list