[pypy-svn] rev 1271 - pypy/trunk/doc/EU_funding

Sun Sep 7 18:49:17 CEST 2003

Author: arigo
Date: Sun Sep  7 18:49:16 2003
New Revision: 1271

Added:
   pypy/trunk/doc/EU_funding/plan.txt
Log:
plan - draft

Added: pypy/trunk/doc/EU_funding/plan.txt
==============================================================================

--- (empty file)
+++ pypy/trunk/doc/EU_funding/plan.txt	Sun Sep  7 18:49:16 2003
@@ -0,0 +1,190 @@
+=========================
+Draft of a PyPy work plan
+=========================
+
+
+1. The PyPy Interpreter
+-----------------------
+
+The goal is to make a complete Python interpreter that runs under any
+existing Python implementation.
+
+ a) develop and complete the PyPy interpreter itself, as a regular
+Python program, until it contains all the parts of CPython that we don't
+want to move to (b). Further investigate the unorthodox multimethod
+concepts that the standard object space is based on, and how to hook in
+the bytecode compiler.
+
+ b) translate all other parts of CPython into regular Python libraries.
+These ones should also work without PyPy, being just plain-Python
+replacements for existing CPython functionality. This includes the
+bytecode compiler.
+
+
+2. Translation of RPython
+-------------------------
+
+The goal is to be able to translate arbitrary RPython source code (e.g. 
+the one produced in 1a) into low-level code (C, Pyrex, Java, others). 
+This includes making a stand-alone, not-PyPy-related tool for general
+optimization of arbitrary but suitably restricted Python application or
+parts thereof.
+
+ a) analyse code to produce the relevant typing information. Investigate
+if we can use the annotation object space only or if additional
+AST-based control flow analysis is needed.
+
+ b) produce low-level code out of the data gathered in (a). Again
+investigate how this is best done (AST-guided translation or
+reverse-engeneering of the low-level control flow gathered by the
+annotation object space). Compare different low-level environment that
+we could target (C, Pyrex, others?).
+
+
+3. Bootstrapping PyPy
+---------------------
+
+The goal is to put (1) and (2) together.
+
+ a) investigate the particular problems specific to the global
+translation of PyPy, as opposed to general to any RPython program. 
+According to the requirements and insights of (2) we will probably have
+to redesign specific parts of PyPy, e.g. make the various
+app-level/interp-level interface designs converge.
+
+ b) build the low-level-specific run-time components of PyPy, most
+notably the object layout, the memory management, possibly threading
+support, and multimethod dispatch. Here, if we target C code, important
+parts can be directly re-used from CPython.
+
+
+4. High-performance PyPy-Python
+-------------------------------
+
+The goal is to optimize (3) in possibly various ways, building on its
+flexibility to go beyond CPython.
+
+ a) develop several object implementations for the same types, as
+explicitely allowed by the standard object space, and develop heuristics
+to switch between implementations during execution.
+
+ b) identify which optimizations would benefit from support from the
+translator (2). These are the optimizations not easily available to
+CPython because they would require large-scale code rewrites.
+
+ c) for each issue, work on several solutions when no one is obviously
+better than the other ones. The meta-programming underlying (b) --
+namely the work on the translator instead of on the resulting code -- is
+what gives us the possibility of actually implementing several very
+different schemes.
+
+ d) integrate existing technology that traditionally depended on closely
+following CPython's code base, notably Psyco and Stackless. Rewrite each
+one as a meta-component that hooks into the translator (2) plus a
+dedicated run-time component (3b). Further develop these technologies
+based on the results gathered in (c), e.g. identify when these
+technologies would guide specific choices among the solutions developed 
+in (a) and (b).
+
+
+Annex to (a)
+~~~~~~~~~~~~
+
+Some major uses for several implementations of the built-in types:
+
+ * dictionaries as hash-table vs. plain (key, value) lists vs. b-trees, 
+or with string-only or integer-only keys. Dictionaries with specific 
+support for "on-change" callbacks (useful for Psyco).
+
+ * strings as plain immutable memory buffers vs. immutable but more 
+complex data structures (see functional languages) vs. internally 
+mutable data structures (e.g. Psyco's concatenated strings)
+
+ * ints as machine words vs. two machine words vs. internal longs vs. 
+external bignum library (investigate if completely unifying ints and
+longs is possible in the Python language at this stage).
+
+ * etc. (lists as range() or chained lists, ...)
+
+The above are mostly independent from any particular low-level run-time 
+environment.
+
+
+Annex to (b)
+~~~~~~~~~~~~
+
+Here are some of the main issues and tricks. Note that compatibility
+with legacy C extensions can be acheived by choosing, for each of the
+following issues, the same one as CPython did.
+
+ * object layout and memory management strategy or strategies, e.g.
+reference counting vs. Boehm garbage collection vs. our own. Includes
+speed vs. data size trade-offs.
+
+ * code size vs. speed trade-offs (e.g. whether the final interpreter
+should still include compact precompiled bytecode or be completely
+translated into C).
+
+ * the complex issue of threading (global interpreter lock vs.
+alternatives).
+
+ * multimethod dispatching
+
+ * pointer tagging, e.g. encoding an integer object as a pointer with a 
+special value instead of a real pointer to a data structure representing 
+the integer.
+
+The above are mostly specific to a particular low-level run-time.
+
+
+5. Low-level targets, tools and releases
+----------------------------------------
+
+The goal is to identify, among those low-level targets that are in
+widespread use (e.g. workstation usage vs. web server vs.
+high-performance computing vs. memory-starved hand-held device; C/Unix
+vs. Java vs. .NET environment), which ones would benefit most from a
+high-performance Python interpreter. For each of these, focus will be
+given to:
+
+ a) develop the translation process, run-time and those optimizations
+that depend on low-level details.
+
+ b) design interfaces for extension modules. Some can be very general
+(e.g. a pure Python one that should allow generic third-party code to
+hook into the PyPy interpreter source code without worrying about the
+translation process). Others depend on the low-level environment and on
+the choices made for the issues of (4).
+
+ c) combine different solutions for the different issues discussed in
+(4). Gather statistics with real-work Python application. Compare the
+results. This is where the flexibility of the whole project is vital.
+Typically, very different trade-offs need to be made on different
+environments.
+
+ d) most importantly, develop tools to easily allow third-parties to
+repeat (c) in their own domain and build their own tailored versions of
+PyPy.
+
+ e) release a few official versions pre-tailored for various common
+environments. Develop in particular a version whose goal is to simulate
+the existing CPython interpreter to support legacy extension modules. 
+Investigate if the PyPy core can make internal choices that are very
+different from CPython's without sacrifying legacy extension modules
+compatibility.
+
+
+6. Infrastructure
+-----------------
+
+The goal is to address the development and maintenance issues.
+
+ a) PyPy's own development needs an infrastructure that must
+continuously be kept up-to-date and further developed.
+
+ b) write tests. All parts of PyPy should be extensively covered by
+stress tests. Investigate the use of test-coverage analysers.
+
+ c) investigate means of keeping PyPy in sync with the future
+developments of CPython, e.g. ways to relate pieces of PyPy source and
+pieces of CPython source. Look for existing solutions.