[pypy-svn] r26533 - pypy/dist/pypy/doc/discussion

Fri Apr 28 20:58:00 CEST 2006

Author: antocuni
Date: Fri Apr 28 20:57:52 2006
New Revision: 26533

Added:
   pypy/dist/pypy/doc/discussion/cli-optimizations.txt   (contents, props changed)
Log:
Some ideas on how to optimize gencli.



Added: pypy/dist/pypy/doc/discussion/cli-optimizations.txt
==============================================================================

--- (empty file)
+++ pypy/dist/pypy/doc/discussion/cli-optimizations.txt	Fri Apr 28 20:57:52 2006
@@ -0,0 +1,196 @@
+Possible optimizations for the CLI backend
+==========================================
+
+Stack push/pop optimitazion
+---------------------------
+
+The CLI's VM is a stack based machine: this fact doesn't play nicely
+with the SSI form the flowgraphs are generated in. At the moment
+gencli does a literal translation of the SSI statements, allocating a
+new local variable for each variable of the flowgraph.
+
+For example, consider the following RPython code and the corresponding
+flowgraph::
+
+  def bar(x, y):
+      foo(x+y, x-y)
+
+
+  inputargs: x_0 y_0
+  v0 = int_add(x_0, y_0)
+  v1 = int_sub(x_0, y_0)
+  v2 = directcall((sm foo), v0, v1)
+
+This is the IL code generated by the CLI backend::
+
+  .locals init (int32 v0, int32 v1, int32 v2)
+    
+  block0:
+    ldarg 'x_0'
+    ldarg 'y_0'
+    add 
+    stloc 'v0'
+    ldarg 'x_0'
+    ldarg 'y_0'
+    sub 
+    stloc 'v1'
+    ldloc 'v0'
+    ldloc 'v1'
+    call int32 foo(int32, int32)
+    stloc 'v2'
+
+As you can see, the results of 'add' and 'sub' are stored in v0 and
+v1, respectively, then v0 and v1 are reloaded onto stack. These
+store/load is redundant, since the code would work nicely even without
+them::
+
+  .locals init (int32 v2)
+    
+  block0:
+    ldarg 'x_0'
+    ldarg 'y_0'
+    add 
+    ldarg 'x_0'
+    ldarg 'y_0'
+    sub 
+    call int32 foo(int32, int32)
+    stloc 'v2'
+
+I've checked the native code generated by the Mono Jit on x86 and I've
+seen that it does not optimize it. I haven't checked the native code
+generated by Microsoft CLR, yet.
+
+Thus, we might consider to optimize it manually; it should not be so
+difficult, but it is not trivial becasue we have to make sure that the
+dropped locals are used only once.
+
+
+Mapping RPython exceptions to native CLI exceptions
+---------------------------------------------------
+
+Both RPython and CLI have its own set of exception classes: some of
+these are pretty similar; e.g., we have OverflowError,
+ZeroDivisionError and IndexError on the first side and
+OverflowException, DivideByZeroException and IndexOutOfRangeException
+on the other side.
+
+The first attempt was to map RPython classes to their corresponding
+CLI ones: this worked for simple cases, but it would have triggered
+subtle bugs in more complex ones, because the two exception
+hierarchies don't completely overlap.
+
+For now I've choosen to build an RPython exception hierarchy
+completely indipendent from the CLI one, but this means that we can't
+rely on exceptions raised by standard operations. The currently
+implemented solution is to do an exception translation on-the-fly; for
+example, the 'ind_add_ovf' is translated into the following IL code::
+
+  .try 
+  { 
+      ldarg 'x_0'
+      ldarg 'y_0'
+      add.ovf 
+      stloc 'v1'
+      leave __check_block_2 
+  } 
+  catch [mscorlib]System.OverflowException 
+  { 
+      newobj instance void class exceptions.OverflowError::.ctor() 
+      dup 
+      ldsfld class Object_meta pypy.runtime.Constants::exceptions_OverflowError_meta 
+      stfld class Object_meta Object::meta 
+      throw 
+  } 
+
+I.e., it catches the builtin OverflowException and raises a RPython
+OverflowError.
+
+I haven't misured timings yet, but I guess that this machinery brings
+to some performance penalties even in the non-overflow case; a
+possible optimization is to do the on-the-fly translation only when it
+is strictly necessary, i.e. only when the except clause catches an
+exception class whose subclass hierarchy is compatible with the
+builtin one. As an example, consider the following RPython code::
+
+  try:
+    return mylist[0]
+  except IndexError:
+    return -1
+
+Given that IndexError has no subclasses, we can map it to
+IndexOutOfBoundException and directly catch this one:
+
+  try
+  {
+    ldloc 'mylist'
+    ldc.i4 0
+    call int32 getitem(MyListType, int32)
+    ...
+  }
+  catch [mscorlib]System.IndexOutOfBoundException
+  {
+    // return -1
+    ...
+  }
+
+By contrast we can't do so if the except clause catches classes that
+don't directly map to any builtin class, such as LookupError::
+
+  try:
+    return mylist[0]
+  except LookupError:
+    return -1
+
+Has to be translated in the old way::
+
+  .try 
+  { 
+    ldloc 'mylist'
+    ldc.i4 0
+
+    .try 
+    {
+        call int32 getitem(MyListType, int32)
+    }
+    catch [mscorlib]System.IndexOutOfBoundException
+    { 
+        // translate IndexOutOfBoundException into IndexError
+        newobj instance void class exceptions.IndexError::.ctor() 
+        dup 
+        ldsfld class Object_meta pypy.runtime.Constants::exceptions_IndexError_meta 
+        stfld class Object_meta Object::meta 
+        throw 
+    }
+    ...
+  }
+  .catch exceptions.LookupError
+  {
+    // return -1
+    ...
+  }
+
+
+Specializing methods of List
+----------------------------
+
+Most methods of RPython lists are implemented by ll_* helpers placed
+in rpython/rlist.py. For some of those we have a direct correspondent
+already implemented in .NET List<>; we could use the oopspec attribute
+for doing an on-the-fly replacement of these low level helpers with
+their builtin correspondent. As an example the 'append' method is
+already mapped to pypylib.List.append. Thanks to Armin Rigo for the
+idea of using oopspec.
+
+
+Doing some caching on Dict
+--------------------------
+
+The current implementations of ll_dict_getitem and ll_dict_get in
+ootypesystem.rdict do two consecutive lookups (calling ll_contains and
+ll_get) on the same key. We might cache the result of
+pypylib.Dict.ll_contains so that the succesive ll_get don't need a
+lookup. Btw, we need some profiling before choosing the best way. Or
+we could directly refactor ootypesystem.rdict for doing a single
+lookup.
+
+