[pypy-svn] r26533 - pypy/dist/pypy/doc/discussion
antocuni at codespeak.net
antocuni at codespeak.net
Fri Apr 28 20:58:00 CEST 2006
Author: antocuni
Date: Fri Apr 28 20:57:52 2006
New Revision: 26533
Added:
pypy/dist/pypy/doc/discussion/cli-optimizations.txt (contents, props changed)
Log:
Some ideas on how to optimize gencli.
Added: pypy/dist/pypy/doc/discussion/cli-optimizations.txt
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/doc/discussion/cli-optimizations.txt Fri Apr 28 20:57:52 2006
@@ -0,0 +1,196 @@
+Possible optimizations for the CLI backend
+==========================================
+
+Stack push/pop optimitazion
+---------------------------
+
+The CLI's VM is a stack based machine: this fact doesn't play nicely
+with the SSI form the flowgraphs are generated in. At the moment
+gencli does a literal translation of the SSI statements, allocating a
+new local variable for each variable of the flowgraph.
+
+For example, consider the following RPython code and the corresponding
+flowgraph::
+
+ def bar(x, y):
+ foo(x+y, x-y)
+
+
+ inputargs: x_0 y_0
+ v0 = int_add(x_0, y_0)
+ v1 = int_sub(x_0, y_0)
+ v2 = directcall((sm foo), v0, v1)
+
+This is the IL code generated by the CLI backend::
+
+ .locals init (int32 v0, int32 v1, int32 v2)
+
+ block0:
+ ldarg 'x_0'
+ ldarg 'y_0'
+ add
+ stloc 'v0'
+ ldarg 'x_0'
+ ldarg 'y_0'
+ sub
+ stloc 'v1'
+ ldloc 'v0'
+ ldloc 'v1'
+ call int32 foo(int32, int32)
+ stloc 'v2'
+
+As you can see, the results of 'add' and 'sub' are stored in v0 and
+v1, respectively, then v0 and v1 are reloaded onto stack. These
+store/load is redundant, since the code would work nicely even without
+them::
+
+ .locals init (int32 v2)
+
+ block0:
+ ldarg 'x_0'
+ ldarg 'y_0'
+ add
+ ldarg 'x_0'
+ ldarg 'y_0'
+ sub
+ call int32 foo(int32, int32)
+ stloc 'v2'
+
+I've checked the native code generated by the Mono Jit on x86 and I've
+seen that it does not optimize it. I haven't checked the native code
+generated by Microsoft CLR, yet.
+
+Thus, we might consider to optimize it manually; it should not be so
+difficult, but it is not trivial becasue we have to make sure that the
+dropped locals are used only once.
+
+
+Mapping RPython exceptions to native CLI exceptions
+---------------------------------------------------
+
+Both RPython and CLI have its own set of exception classes: some of
+these are pretty similar; e.g., we have OverflowError,
+ZeroDivisionError and IndexError on the first side and
+OverflowException, DivideByZeroException and IndexOutOfRangeException
+on the other side.
+
+The first attempt was to map RPython classes to their corresponding
+CLI ones: this worked for simple cases, but it would have triggered
+subtle bugs in more complex ones, because the two exception
+hierarchies don't completely overlap.
+
+For now I've choosen to build an RPython exception hierarchy
+completely indipendent from the CLI one, but this means that we can't
+rely on exceptions raised by standard operations. The currently
+implemented solution is to do an exception translation on-the-fly; for
+example, the 'ind_add_ovf' is translated into the following IL code::
+
+ .try
+ {
+ ldarg 'x_0'
+ ldarg 'y_0'
+ add.ovf
+ stloc 'v1'
+ leave __check_block_2
+ }
+ catch [mscorlib]System.OverflowException
+ {
+ newobj instance void class exceptions.OverflowError::.ctor()
+ dup
+ ldsfld class Object_meta pypy.runtime.Constants::exceptions_OverflowError_meta
+ stfld class Object_meta Object::meta
+ throw
+ }
+
+I.e., it catches the builtin OverflowException and raises a RPython
+OverflowError.
+
+I haven't misured timings yet, but I guess that this machinery brings
+to some performance penalties even in the non-overflow case; a
+possible optimization is to do the on-the-fly translation only when it
+is strictly necessary, i.e. only when the except clause catches an
+exception class whose subclass hierarchy is compatible with the
+builtin one. As an example, consider the following RPython code::
+
+ try:
+ return mylist[0]
+ except IndexError:
+ return -1
+
+Given that IndexError has no subclasses, we can map it to
+IndexOutOfBoundException and directly catch this one:
+
+ try
+ {
+ ldloc 'mylist'
+ ldc.i4 0
+ call int32 getitem(MyListType, int32)
+ ...
+ }
+ catch [mscorlib]System.IndexOutOfBoundException
+ {
+ // return -1
+ ...
+ }
+
+By contrast we can't do so if the except clause catches classes that
+don't directly map to any builtin class, such as LookupError::
+
+ try:
+ return mylist[0]
+ except LookupError:
+ return -1
+
+Has to be translated in the old way::
+
+ .try
+ {
+ ldloc 'mylist'
+ ldc.i4 0
+
+ .try
+ {
+ call int32 getitem(MyListType, int32)
+ }
+ catch [mscorlib]System.IndexOutOfBoundException
+ {
+ // translate IndexOutOfBoundException into IndexError
+ newobj instance void class exceptions.IndexError::.ctor()
+ dup
+ ldsfld class Object_meta pypy.runtime.Constants::exceptions_IndexError_meta
+ stfld class Object_meta Object::meta
+ throw
+ }
+ ...
+ }
+ .catch exceptions.LookupError
+ {
+ // return -1
+ ...
+ }
+
+
+Specializing methods of List
+----------------------------
+
+Most methods of RPython lists are implemented by ll_* helpers placed
+in rpython/rlist.py. For some of those we have a direct correspondent
+already implemented in .NET List<>; we could use the oopspec attribute
+for doing an on-the-fly replacement of these low level helpers with
+their builtin correspondent. As an example the 'append' method is
+already mapped to pypylib.List.append. Thanks to Armin Rigo for the
+idea of using oopspec.
+
+
+Doing some caching on Dict
+--------------------------
+
+The current implementations of ll_dict_getitem and ll_dict_get in
+ootypesystem.rdict do two consecutive lookups (calling ll_contains and
+ll_get) on the same key. We might cache the result of
+pypylib.Dict.ll_contains so that the succesive ll_get don't need a
+lookup. Btw, we need some profiling before choosing the best way. Or
+we could directly refactor ootypesystem.rdict for doing a single
+lookup.
+
+
More information about the Pypy-commit
mailing list