[pypy-svn] r38355 - pypy/dist/pypy/doc
antocuni at codespeak.net
antocuni at codespeak.net
Sat Feb 10 10:46:40 CET 2007
Author: antocuni
Date: Sat Feb 10 10:46:39 2007
New Revision: 38355
Added:
pypy/dist/pypy/doc/cli-backend.txt (contents, props changed)
Modified:
pypy/dist/pypy/doc/index.txt
Log:
- Move the JS-backend tutorial to the user section;
- Add some documentation on the CLI backend, partly borrowed from my
thesis
Added: pypy/dist/pypy/doc/cli-backend.txt
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/doc/cli-backend.txt Sat Feb 10 10:46:39 2007
@@ -0,0 +1,431 @@
+===============
+The CLI backend
+===============
+
+The goal of GenCLI is to compile RPython programs to the CLI virtual
+machine.
+
+
+Target environment and language
+===============================
+
+The target of GenCLI is the Common Language Infrastructure environment
+as defined by the `Standard Ecma 335`_.
+
+While in an ideal world we might suppose GenCLI to run fine with
+every implementation conforming to that standard, we know the world we
+live in is far from ideal, so extra efforts can be needed to mantain
+compatibility with more than one implementation.
+
+At the moment of writing the two most popular implementations of the
+standard are supported: Microsoft Common Language Runtime (CLR) and
+Mono.
+
+Then we have to choose how to generate the real executables. There are
+two main alternatives: generating source files in some high level
+language (such as C#) or generating assembly level code in
+Intermediate Language (IL).
+
+The IL approach is much faster during the code generation
+phase, because it doesn't need to call a compiler. By contrast the
+high level approach has two main advantages:
+
+ - the code generation part could be easier because the target
+ language supports high level control structures such as
+ structured loops;
+
+ - the generated executables take advantage of compiler's
+ optimizations.
+
+In reality the first point is not an advantage in the PyPy context,
+because the `flow graph`_ we start from is quite low level and Python
+loops are already expressed in terms of branches (i.e., gotos).
+
+About the compiler optimizations we must remember that the flow graph
+we receive from earlier stages is already optimized: PyPy implements
+a number of optimizations such a constant propagation and
+dead code removal, so it's not obvious if the compiler could
+do more.
+
+Moreover by emitting IL instruction we are not constrained to rely on
+compiler choices but can directly choose how to map CLI opcodes: since
+the backend often know more than the compiler about the context, we
+might expect to produce more efficient code by selecting the most
+appropriate instruction; e.g., we can check for arithmetic overflow
+only when strictly necessary.
+
+The last but not least reason for choosing the low level approach is
+flexibility in how to get an executable starting from the IL code we
+generate:
+
+ - write IL code to a file, then call the ilasm assembler;
+
+ - directly generate code on the fly by accessing the facilities
+ exposed by the System.Reflection.Emit API.
+
+The second point is not feasible yet because at the moment there is no
+support for accessing system libraries, but in future it could lead to
+an interesting GenCLI feature, i.e. the ability of emitting dynamic
+code at runtime.
+
+
+Handling platform differences
+=============================
+
+Since our goal is to support both Microsoft CLR we have to handle the
+differences between the twos; in particular the main differences are
+in the name of the helper tools we need to call:
+
+=============== ======== ======
+Tool CLR Mono
+=============== ======== ======
+IL assembler ilasm ilasm2
+C# compiler csc gmcs
+Runtime ... mono
+=============== ======== ======
+
+The code that handles these differences is located in the sdk.py
+module: it defines an abstract class exposing some methods returning
+the name of the helpers and one subclass for each of the two supported
+platforms.
+
+
+Targeting the CLI Virtual Machine
+=================================
+
+In order to write a CLI backend we have to take a number of decisions.
+First, we have to choose the typesystem to use: given that CLI
+natively supports primitives like classes and instances,
+ootypesystem is the most natural choice.
+
+Once the typesystem has been chosen there is a number of steps we have
+to do for completing the backend:
+
+ - map ootypesystem's types to CLI Common Type System's
+ types;
+
+ - map ootypesystem's low level operation to CLI instructions;
+
+ - map Python exceptions to CLI exceptions;
+
+ - write a code generator that translates a flow graph
+ into a list of CLI instructions;
+
+ - write a class generator that translates ootypesystem
+ classes into CLI classes.
+
+
+Mapping primitive types
+-----------------------
+
+The `rtyper` give us a flow graph annotated with types belonging to
+ootypesystem: in order to produce CLI code we need to translate these
+types into their Common Type System equivalents.
+
+For numeric types the conversion is straightforward, since
+there is a one-to-one mapping between the two typesystems, so that
+e.g. Float maps to float64.
+
+For character types the choice is more difficult: RPython has two
+distinct types for plain ASCII and Unicode characters (named UniChar),
+while .NET only supports Unicode with the char type. There are at
+least two ways to map plain Char to CTS:
+
+ - map UniChar to char, thus mantaining the original distinction
+ between the two types: this has the advantage of being a
+ one-to-one translation, but has the disadvantage that RPython
+ strings will not be recognized as .NET strings, since they only
+ would be sequences of bytes;
+
+ - map both char, so that Python strings will be treated as strings
+ also by .NET: in this case there could be problems with existing
+ Python modules that use strings as sequences of byte, such as the
+ built-in struct module, so we need to pay special attention.
+
+We think that mapping Python strings to .NET strings is
+fundamental, so we chose the second option.
+
+Mapping built-in types
+----------------------
+
+As we saw in section ootypesystem defines a set of types that take
+advantage of built-in types offered by the platform.
+
+For the sake of simplicity we decided to write wrappers
+around .NET classes in order to match the signatures required by
+pypylib.dll:
+
+=================== ===========================================
+ootype CLI
+=================== ===========================================
+String System.String
+StringBuilder System.Text.StringBuilder
+List System.Collections.Generic.List<T>
+Dict System.Collections.Generic.Dictionary<K, V>
+CustomDict pypy.runtime.Dict
+DictItemsIterator pypy.runtime.DictItemsIterator
+=================== ===========================================
+
+Wrappers exploit inheritance for wrapping the original classes, so,
+for example, pypy.runtime.List<T> is a subclass of
+System.Collections.Generic.List<T> that provides methods whose names
+match those found in the _GENERIC_METHODS of ootype.List
+
+The only exception to this rule is the String class, which is not
+wrapped since in .NET we can not subclass System.String. Instead, we
+provide a bunch of static methods in pypylib.dll that implement the
+methods declared by ootype.String._GENERIC_METHODS, then we call them
+by explicitly passing the string object in the argument list.
+
+
+Mapping instructions
+--------------------
+
+PyPy's low level operations are expressed in Static Single Information
+(SSI) form, such as this::
+
+ v2 = int_add(v0, v1)
+
+By contrast the CLI virtual machine is stack based, which means the
+each operation pops its arguments from the top of the stacks and
+pushes its result there. The most straightforward way to translate SSI
+operations into stack based operations is to explicitly load the
+arguments and store the result into the appropriate places::
+
+ LOAD v0
+ LOAD v1
+ int_add
+ STORE v2
+
+The code produced works correctly but has some inefficiency issue that
+can be addressed during the optimization phase.
+
+The CLI Virtual Machine is fairly expressive, so the conversion
+between PyPy's low level operations and CLI instruction is relatively
+simple: many operations maps directly to the correspondent
+instruction, e.g int_add and sub.
+
+By contrast some instructions do not have a direct correspondent and
+have to be rendered as a sequence of CLI instructions: this is the
+case of the "less-equal" and "greater-equal" family of instructions,
+that are rendered as "greater" or "less" followed by a boolean "not",
+respectively.
+
+Finally, there are some instructions that cannot be rendered directly
+without increasing the complexity of the code generator, such as
+int_abs (which returns the absolute value of its argument). These
+operations are translated by calling some helper function written in
+C#.
+
+The code that implements the mapping is in the modules opcodes.py.
+
+Mapping exceptions
+------------------
+
+Both RPython and CLI have its own set of exception classes: some of
+these are pretty similar; e.g., we have OverflowError,
+ZeroDivisionError and IndexError on the first side and
+OverflowException, DivideByZeroException and IndexOutOfRangeException
+on the other side.
+
+The first attempt was to map RPython classes to their corresponding
+CLI ones: this worked for simple cases, but it would have triggered
+subtle bugs in more complex ones, because the two exception
+hierarchies don't completely overlap.
+
+At the moment we've choosen to build an RPython exception hierarchy
+completely independent from the CLI one, but this means that we can't
+rely on exceptions raised by built-in operations. The currently
+implemented solution is to do an exception translation on-the-fly.
+
+As an example consider the RPython int_add_ovf operation, that sums
+two integers and raises an OverflowError exception in case of
+overflow. For implementing it we can use the built-in add.ovf CLI
+instruction that raises System.OverflowExcepion when the result
+overflows, catch that exception and throw a new one::
+
+ .try
+ {
+ ldarg 'x_0'
+ ldarg 'y_0'
+ add.ovf
+ stloc 'v1'
+ leave __check_block_2
+ }
+ catch [mscorlib]System.OverflowException
+ {
+ newobj instance void class OverflowError::.ctor()
+ throw
+ }
+
+
+Translating flow graphs
+-----------------------
+
+As we saw previously in PyPy\ function and method bodies are
+represented by flow graphs that we need to translate CLI IL code. Flow
+graphs are expressed in a format that is very suitable for being
+translated to low level code, so that phase is quite straightforward,
+though the code is a bit involed because we need to take care of three
+different types of blocks.
+
+The code doing this work is located in the Function.render
+method in the file function.py.
+
+First of all it searches for variable names and types used by
+each block; once they are collected it emits a .local IL
+statement used for indicating the virtual machine the number and type
+of local variables used.
+
+Then it sequentally renders all blocks in the graph, starting from the
+start block; special care is taken for the return block which is
+always rendered at last to meet CLI requirements.
+
+Each block starts with an unique label that is used for jumping
+across, followed by the low level instructions the block is composed
+of; finally there is some code that jumps to the appropriate next
+block.
+
+Conditional and unconditional jumps are rendered with their
+corresponding IL instructions: brtrue, brfalse.
+
+Blocks that needs to catch exceptions use the native facilities
+offered by the CLI virtual machine: the entire block is surrounded by
+a .try statement followed by as many catch as needed: each catching
+sub-block then branches to the appropriate block::
+
+
+ # RPython
+ try:
+ # block0
+ ...
+ except ValueError:
+ # block1
+ ...
+ except TypeError:
+ # block2
+ ...
+
+ // IL
+ block0:
+ .try {
+ ...
+ leave block3
+ }
+ catch ValueError {
+ ...
+ leave block1
+ }
+ catch TypeError {
+ ...
+ leave block2
+ }
+ block1:
+ ...
+ br block3
+ block2:
+ ...
+ br block3
+ block3:
+ ...
+
+There is also an experimental feature that makes GenCLI to use its own
+exception handling mechanism instead of relying on the .NET
+one. Surprisingly enough, benchmarks are about 40% faster with our own
+exception handling machinery.
+
+
+Translating classes
+-------------------
+
+As we saw in section sec:ootypes, the semantic of ootypesystem classes
+is very similar to the .NET one, so the translation is mostly
+straightforward.
+
+The related code is located in the module class\_.py. Rendered classes
+are composed of four parts:
+
+ - fields;
+ - user defined methods;
+ - default constructor;
+ - the ToString method, mainly for testing purposes
+
+Since ootype implicitly assumes all method calls to be late bound, as
+an optimization before rendering the classes we search for methods
+that are not overridden in subclasses, and declare as "virtual" only
+the one that needs to.
+
+The constructor does nothing more than calling the base class
+constructor and initializing class fields to their default value.
+
+Inheritance is straightforward too, as it is natively supported by
+CLI. The only noticeable thing is that we map ootypesystem's ROOT
+class to the CLI equivalent System.Object.
+
+The Runtime Environment
+-----------------------
+
+The runtime environment is a collection of helper classes and
+functions used and referenced by many of the GenCLI submodules. It is
+written in C#, compiled to a DLL (Dynamic Link Library), then linked
+to generated code at compile-time.
+
+The DLL is called pypylib and is composed of three parts:
+
+ - a set of helper functions used to implements complex RPython
+ low-level instructions such as runtimenew and ooparse_int;
+
+ - a set of helper classes wrapping built-in types
+
+ - a set of helpers used by the test framework
+
+
+The first two parts are contained in the pypy.runtime namespace, while
+the third is in the pypy.test one.
+
+
+Testing GenCLI
+==============
+
+As the rest of PyPy, GenCLI is a test-driven project: there is at
+least one unit test for almost each single feature of the
+backend. This development methodology allowed us to early discover
+many subtle bugs and to do some big refactoring of the code with the
+confidence not to break anything.
+
+The core of the testing framework is in the module
+pypy.translator.cli.test.runtest; one of the most important function
+of this module is compile_function(): it takes a Python function,
+compiles it to CLI and returns a Python object that runs the just
+created executable when called.
+
+This way we can test GenCLI generated code just as if it were a simple
+Python function; we can also directly run the generated executable,
+whose default name is main.exe, from a shell: the function parameters
+are passed as command line arguments, and the return value is printed
+on the standard output::
+
+ # Python source: foo.py
+ from pypy.translator.cli.test.runtest import compile_function
+
+ def foo(x, y):
+ return x+y, x*y
+
+ f = compile_function(foo, [int, int])
+ assert f(3, 4) == (7, 12)
+
+
+ # shell
+ $ mono main.exe 3 4
+ (7, 12)
+
+GenCLI supports only few RPython types as parameters: int, r_uint,
+r_longlong, r_ulonglong, bool, float and one-length strings (i.e.,
+chars). By contrast, most types are fine for being returned: these
+include all primitive types, list, tuples and instances.
+
+
+
+.. _`Standard Ecma 335`: http://www.ecma-international.org/\\publications/standards/Ecma-335.htm
+.. _`flow graph`: http://codespeak.net/pypy/dist/pypy/doc/translation.html#the-flow-model
+.. _`rtyper`: http://codespeak.net/pypy/dist/pypy/doc/rtyper.html
Modified: pypy/dist/pypy/doc/index.txt
==============================================================================
--- pypy/dist/pypy/doc/index.txt (original)
+++ pypy/dist/pypy/doc/index.txt Sat Feb 10 10:46:39 2007
@@ -24,6 +24,9 @@
to write modules in PyPy's style and compile them into regular CPython
extension modules.
+`JavaScript backend`_ describes how to use the JavaScript backend to create
+AJAX-based web pages.
+
Status_ of the project.
@@ -113,8 +116,7 @@
`rlib`_ describes some modules that can be used when implementing programs in
RPython.
-`JavaScript backend`_ describes how to use the JavaScript backend to create
-AJAX-based web pages.
+`CLI backend`_ describes the details of the .NET backend.
`JIT Generation in PyPy`_ describes how we produce the Python Just-in-time Compiler
from our Python interpreter.
@@ -300,7 +302,7 @@
.. _`translation`: translation.html
.. _`GenC backend`: translation.html#genc
.. _`LLVM backend`: translation.html#llvm
-.. _`CLI backend`: translation.html#gencli
+.. _`CLI backend`: cli-backend.html
.. _`Common Lisp backend`: translation.html#gencl
.. _`Squeak backend`: translation.html#gensqueak
.. _`revision report`: http://codespeak.net/pypy/rev/current
More information about the Pypy-commit
mailing list