[pypy-svn] rev 826 - pypy/trunk/doc

lac at codespeak.net lac at codespeak.net
Tue Jun 17 20:50:33 CEST 2003


Author: lac
Date: Tue Jun 17 20:50:32 2003
New Revision: 826

Added:
   pypy/trunk/doc/oscon2003-paper.txt
Modified:
   pypy/trunk/doc/builtins.txt
Log:
Holger had a better idea.  Put the OSCON paper here.


Modified: pypy/trunk/doc/builtins.txt
==============================================================================
--- pypy/trunk/doc/builtins.txt	(original)
+++ pypy/trunk/doc/builtins.txt	Tue Jun 17 20:50:32 2003
@@ -132,7 +132,7 @@
 deon pow
 n t  property
 n v  quit -- (see exit)
-n pi range
+d pi range
 nepn raw_input
 n pn reduce
 nhpn reload

Added: pypy/trunk/doc/oscon2003-paper.txt
==============================================================================
--- (empty file)
+++ pypy/trunk/doc/oscon2003-paper.txt	Tue Jun 17 20:50:32 2003
@@ -0,0 +1,519 @@
+Implementing Python in Python  -- A report from the PyPy project
+
+The PyPython project aims at producing a simple runtime-system for the
+Python language, written in Python itself.  Sooner or later, this
+happens to most interesting computer languages.  The temptation is
+great.  Each significant computer language has a certain
+expressiveness and power, and it is frustrating to not be able to use
+that expressiveness and power when writing the language itself.  Thus
+we have Scheme, Lisp-written-in-Lisp, and Squeak,
+Smalltalk-written-in-Smalltalk.  So why not Python-written-in-Python?
+
+Besides using the expressiveness of Python to write Python, we also
+aim to produce a minimal core which is Simple and Flexible, and no
+longer dependent on CPython.  We will take care that PyPy will integrate
+easily with PSYCO and Stackless -- goals that are attainable with both
+Armin Rigo (author of PSYCO) and Christian Tismer (author of Stackless)
+on the team.  Samuele Pedroni, catches us when we unwittingly make
+C-ish assumptions.  By keeping things Simple and Flexible we can
+produce code that has attractions for both industry and academia.
+Academics will find that this Python is even easier to teach concepts
+of language design with.  And ending the dependence on CPython means
+that we can produce a Python with a smaller footprint.  Eventually,
+we would like to produce a faster Python.  We are very far from that
+now, since we have spent no effort on speed and have only worked on
+Simple and Flexible.
+
+1. How have we set about it.
+
+Most of you know what happens if you type
+
+'import this'
+
+at your favourite Python prompt.  You get 'The Zen of Python',
+by Tim Peters.  It starts
+
+        Beautiful is better than ugly.
+        Explicit is better than implicit.
+        Simple is better than complex.
+
+and ends with:
+
+        Namespaces are one honking great idea -- let's do more of those!
+
+What would 'doing more of those'  mean?  Here is one approach.
+
+ In a Python-like language, a running interpreter has three main parts:
+
+    * the main loop, which suffles data around and calls the operations
+      defined in the object library according to the bytecode.
+
+    * the compiler, which represents the static optimization of the
+      source code into an intermediate format, the bytecode;
+
+    * the object library, implementing the various types of objects
+      and their semantics;
+
+In PyPy, the three parts are clearly separated and can be replaced
+independently.  The main loop generally assumes little about the semantics
+of the objects: they are essentially black boxes (PyObject pointers). The
+interpreter stack and the variables only contain such black boxes.
+Every operation is done via calls to the object library, such as
+PyNumber_Add().  We haven't done much to make the compiler and the main
+lac at ratthing-b246:~/Mail/pypy/paper$ cat paper
+Implementing Python in Python  -- A report from the PyPy project
+
+The PyPython project aims at producing a simple runtime-system for the
+Python language, written in Python itself.  Sooner or later, this
+happens to most interesting computer languages.  The temptation is
+great.  Each significant computer language has a certain
+expressiveness and power, and it is frustrating to not be able to use
+that expressiveness and power when writing the language itself.  Thus
+we have Scheme, Lisp-written-in-Lisp, and Squeak,
+Smalltalk-written-in-Smalltalk.  So why not Python-written-in-Python?
+
+Besides using the expressiveness of Python to write Python, we also
+aim to produce a minimal core which is Simple and Flexible, and no
+longer dependent on CPython.  We will take care that PyPy will integrate
+easily with PSYCO and Stackless -- goals that are attainable with both
+Armin Rigo (author of PSYCO) and Christian Tismer (author of Stackless)
+on the team.  Samuele Pedroni, catches us when we unwittingly make
+C-ish assumptions.  By keeping things Simple and Flexible we can
+produce code that has attractions for both industry and academia.
+Academics will find that this Python is even easier to teach concepts
+of language design with.  And ending the dependence on CPython means
+that we can produce a Python with a smaller footprint.  Eventually,
+we would like to produce a faster Python.  We are very far from that
+now, since we have spent no effort on speed and have only worked on
+Simple and Flexible.
+
+1. How have we set about it.
+
+Most of you know what happens if you type
+
+'import this'
+
+at your favourite Python prompt.  You get 'The Zen of Python',
+by Tim Peters.  It starts
+
+        Beautiful is better than ugly.
+        Explicit is better than implicit.
+        Simple is better than complex.
+
+and ends with:
+
+        Namespaces are one honking great idea -- let's do more of those!
+
+What would 'doing more of those'  mean?  Here is one approach.
+
+ In a Python-like language, a running interpreter has three main parts:
+
+    * the main loop, which suffles data around and calls the operations
+      defined in the object library according to the bytecode.
+
+    * the compiler, which represents the static optimization of the
+      source code into an intermediate format, the bytecode;
+
+    * the object library, implementing the various types of objects
+      and their semantics;
+
+In PyPy, the three parts are clearly separated and can be replaced
+independently.  The main loop generally assumes little about the semantics
+of the objects: they are essentially black boxes (PyObject pointers). The
+interpreter stack and the variables only contain such black boxes.
+Every operation is done via calls to the object library, such as
+PyNumber_Add().  We haven't done much to make the compiler and the main
+loop into explicit concepts (yet),  because we have been concentrating
+on making seprable object libraries.
+
+We call the separable object library, an Object Space.
+We call Wrapped Objects the black boxes of an Object Space.
+
+But the exciting thing is that while existing languages implement _one_
+Object Space, by separating things we have produced an architecture
+which will enable us to run more than one Object Space in the same
+interpreter at the same time.  This idea has some exciting implications.
+
+So let us dream for a bit.
+
+First dream: How do you get computer science concepts in language design
+more effectively into student brains?
+
+Traditionally, academics come up with interesting new ideas and
+concepts, but, to the extent that they are truly new end up creating
+another language to express these ideas in.  Unfortunately, many
+languages end up merely as a vehicle for too few new ideas.  We end up
+force-feeding our poor students with too many computer languages, too
+quickly -- each language designed to teach a particular point.  And
+many of our languages are particularly weak on everything _but_ the
+point we wish to make.
+
+Things would go a lot more smoothly if we could only create an Object
+Space which obeys the new rules we have thought up, and drop it into
+an existing language.  Comparisons between other ways of doing things
+would also be a lot simpler.  Finally, we could reasonably ask our
+students to implement these ideas in Python and let them drop them in,
+leaving all the other bits, irrelevant for our educational purposes as
+they already are written.  There is no better way to learn about
+compiler writing, than writing compilers, but a lot of todays
+education in compiler writing leaves a huge gap between 'the theory
+that is in the book which the student is expected to learn' and 'what
+is reasonable for a student to implement as coursework'.  Students can
+spend all semester overcoming difficulties in _actually getting the IO
+to work_, and _interfacing with the runtime libraries_, while only
+spending a fraction of the time on the concepts which you are trying
+to teach.
+
+Object Spaces will provide a better fit between the the abstract
+concepts we wish to teach and the code written to implement just that.
+
+Dream number Two: A Slimmer Python
+
+People who write code for handhelds and other embed Python into other
+devices often wish that they could have a much smaller footprint.  Now
+they can ask for a Tiny Object Space which only implements the
+behaviour which they need, and skips the parts that they do not.
+
+Dream number Three -- What is the best way to implement a dict?
+
+This depends on how much data you intend to store in your dict.  If you
+never expect your dict to have more than a half dozen items, a really
+fast list may be best.  Larger dicts might best be implemented as
+hashes.  And for storing enormous amounts of data, a binary tree
+might be just what you would be interested in.  In principal, there is
+nothing to stop your interpreter from keeping statistics on how it is
+being used, and to move from strategy to strategy at runtime.
+
+Dream number Four -- How would you like your operators to work today?
+
+Consider y = int(x).  How would you like this to work when x is 4.2,
+4.5, -4.2 and -4.5?  Currently Python says 4, 4, -4 and -4, truncating
+towards zero.  But for certain applications, this is not what is desired.
+You would prefer round behaviour  4, 5, -4 -5 -- rounding away from zero.
+Or you would like to always return the larger integer 5 5 -4 -4.  Sometimes
+just running an existing program and changing such behaviour can reveal
+interesting embedded assumptions of which the author may be unaware.
+
+Changing the behaviour and seeing how the results change ought to be
+straight-forward, simple, and easy.
+
+Dream number Five -- Running different Object Spaces on different processors
+of the same machine.
+
+Dream number Six -- Running different Object Spaces on different machines.
+
+This is one of the unrealised dreams of distributed computing.  It would
+often be convenient to allow the various machines on a network to share
+cpu resources.  Thus one could begin a computation on a small device,
+say a mobile phone, or a PDA, and have the program automatically notice
+that the computation was too strenuous for such a divice, and automatically
+forward the computation to a machine with more computational power.
+
+Dream number Seven -- A Smarter, more Dynamic Interpreter
+
+There is no reason why your Python interpreter could not keep statistics
+of how it is being used, and automatically select from a collection of
+algorithms the one which is best suited for the data at hand.
+
+Dream number Eight -- How to avoid painful conversion of your code base
+to new versions of the language.
+
+This dream is a bit far-fetched, but it is worth investigating.  Right
+now, whenever a new release of Python comes out, existing Python
+programs have to be modified whenever there are conflicts.  Thus there
+is a trade off between getting the new features which contribute to
+increased productivity in program design, and having to fix piles of
+old code that wasn't broken until the language changed.  With this
+approach it may be possible to save your cake and eat it too.  You
+could demand that all your old modules use PyPy Object Space Nov 16
+2004, while immediately have your new modules use the brand new PyPy
+Object Space which was defined yesterday.  You could update any old
+modules that would particularily benefit from having the new features,
+and leave the old ones alone.
+
+Dream number Nine:  faster Python
+
+While we are writing an adaptive, smarter compiler, we ought to be able
+to make it faster.  We think faster than C Python is a realistic goal,
+eventually.  When faster algorithms are discovered, we will be able to
+quickly place them in the interpreter, because the components are
+more or less independent.  This is something that Armin Rigo
+and Christian Tismer know a lot about, and I know very little.
+
+Dream number Ten:  world domination and ....
+(Well, if we can pull off Dreams 1-9, this should just drop out of the
+design...)
+
+And if we don't pull this off, we will have at least learned a lot. This
+in itself makes the project worth doing.  Plus it's fun...
+
+But away from the dreams and back to what do we currently have?
+
+We currently implement (or partially implement) two Object Spaces, and
+have plans to implement a third in short order.
+
+1.      The Trivial Object Space
+
+A PyPy interpreter using the TrivialObjectSpace is an interpreter with
+its own main loop (written in Python), and nothing else.  This main
+loop manipulates real Python objects and all operations are done
+directly on the Python objects. For example, "1" really means "1" and
+when the interpreter encounters the BINARY_ADD bytecode instructions
+the TrivialObjectSpace will just add two real Python objects together
+using Python's "+". The same for lists, dictionaries, classes... We
+just use Python's own.
+
+This Object Space is only useful for testing the concept of Object Spaces,
+and our interpreter, or even interpreting different kinds of bytecodes.
+This is already done; it is funny to watch "dis.dis" disassembling itself
+painfully slowly.
+
+Getting this to work was a goal of the Hildesheim Sprint February 16-23.
+It demonstrated that our Object Space Concept was viable, and that our
+interpreter worked.
+
+2.      The Standard Object Space
+
+The Standard Object Space is the object space that works just like
+Python's, that is, the object space whose black boxes are real Python
+objects that work as expected. This is where the bulk of the work in
+PyPy has been done to date.  Getting the Standard Object Space to
+work was a goal of the Gothenburg Sprint May 24 - 31.
+
+Specifically we needed to get this code:
+
+aStr = 'hello world'
+print len(aStr)
+
+to run.  We needed types and builtins to work.  This ran, slowly.
+
+Then we added strings.  Getting this code to work was the second
+goal.
+
+### a trivial program to test strings, lists, functions and methods ###
+
+def addstr(s1,s2):
+    return s1 + s2
+
+str = "an interesting string"
+str2 = 'another::string::xxx::y:aa'
+str3 = addstr(str,str2)
+arr = []
+for word in str.split():
+    if word in str2.split('::'):
+        arr.append(word)
+print ''.join(arr)
+print "str + str2 = ", str3
+
+This we accomplished by mid-week.
+
+By the end of the Sprint we produced our first Python program that
+ran under PyPy which simply 'did something we wanted to do' and wasn't
+an artificial goal.  Specifically, it calculated the share in foodbill
+for each of the 9 Sprint participants.
+
+lips=[(1, 'Kals MatMarkn', 6150, 'Chutney for Curry', 'dinner Saturday'),
+       (2, 'Kals MatMarkn', 32000, 'Spaghetti, Beer', 'dinner Monday'),
+       (2, 'Kals MatMarkn', -810, 'Deposit on Beer Bottles', 'various'),
+       (3, 'Fram', 7700, 'Rice and Curry Spice', 'dinner Saturday'),
+       (4, 'Kals MatMarkn', 25000, 'Alcohol-Free Beer, sundries', 'various'),
+       (4, 'Kals MatMarkn', -1570, "Michael's toothpaste", 'none'),
+       (4, 'Kals MatMarkn', -1690, "Laura's toothpaste", 'none'),
+       (4, 'Kals MatMarkn', -720, 'Deposit on Beer Bottles', 'various'),
+       (4, 'Kals MatMarkn', -60, 'Deposit on another Beer Bottle', 'various'),
+       (5, 'Kals MatMarkn', 26750, 'lunch bread meat cheese', 'lunch Monday'),
+       (6, 'Kals MatMarkn', 15950, 'various', 'dinner Tuesday and Thursday'),
+       (7, 'Kals MatMarkn', 3650, 'Drottningsylt, etc.', 'dinner Thursday'),
+       (8, 'Kals MatMarkn', 26150, 'Chicken and Mushroom Sauce', 'dinner Wed'),        (8, 'Kals MatMarkn', -2490, 'Jacob and Laura -- juice', 'dinner Wed'),
+       (8, 'Kals MatMarkn', -2990, "Chicken we didn't cook", 'dinner Wednesday'),
+       (9, 'Kals MatMarkn', 1380, 'fruit for Curry', 'dinner Saturday'),
+       (9, 'Kals MatMarkn', 1380, 'fruit for Curry', 'dinner Saturday'),
+       (10, 'Kals MatMarkn', 26900, 'Jansons Frestelse', 'dinner Sunday'),
+       (10, 'Kals MatMarkn', -540, 'Deposit on Beer Bottles', 'dinner Sunday'),        (11, 'Kals MatMarkn', 22650, 'lunch bread meat cheese', 'lunch Thursday'),
+       (11, 'Kals MatMarkn', -2190, 'Jacob and Laura -- juice', 'lunch Thursday'),
+       (11, 'Kals MatMarkn', -2790, 'Jacob and Laura -- cereal', 'lunch Thurs'),       (11, 'Kals MatMarkn', -760, 'Jacob and Laura -- milk', 'lunch Thursday'),       (12, 'Kals MatMarkn', 18850, 'lunch bread meat cheese', 'lunch Friday'),        (13, 'Kals MatMarkn', 18850, 'lunch bread meat cheese', 'guestimate Sun'),
+       (14, 'Kals MatMarkn', 18850, 'lunch bread meat cheese', 'guestimate Tues'),
+       (15, 'Kals MatMarkn', 20000, 'lunch bread meat cheese', 'guestimate Wed'),
+       (16, 'Kals MatMarkn', 42050, 'grillfest', 'dinner Friday'),
+       (16, 'Kals MatMarkn', -1350, 'Deposit on Beer Bottles', 'dinner Friday'),       (17, 'System Bolaget', 15500, 'Cederlunds Caloric', 'dinner Thursday'),
+       (17, 'System Bolaget', 22400, '4 x Farnese Sangiovese 56SEK', 'various'),       (17, 'System Bolaget', 22400, '4 x Farnese Sangiovese 56SEK', 'various'),       (17, 'System Bolaget', 13800, '2 x Jacobs Creek 69SEK', 'various'),
+       (18, 'J and Ls winecabinet', 10800, '2 x Parrotes 54SEK', 'various'),
+       (18, 'J and Ls winecabinet', 14700, '3 x Saint Paulin 49SEK', 'various'),       (18, 'J and Ls winecabinet', 10400, '2 x Farnese Sangioves 52SEK',
+        'cheaper when we bought it'),
+       (18, 'J and Ls winecabinet', 17800, '2 x Le Poiane 89SEK', 'various'),
+       (18, 'J and Ls winecabinet', 9800, '2 x Something Else 49SEK', 'various'),
+       (19, 'Konsum', 26000, 'Saturday Bread and Fruit', 'Slip MISSING'),
+       (20, 'Konsum', 15245, 'Mooseburgers', 'found slip'),
+       (21, 'Kals MatMarkn', 20650, 'Grilling', 'Friday dinner'),
+       (22, 'J and Ls freezer', 21000, 'Meat for Curry, grilling', ''),
+       (22, 'J and Ls cupboard', 3000, 'Rice', ''),
+       (22, 'J and Ls cupboard', 4000, 'Charcoal', ''),
+       (23, 'Fram', 2975, 'Potatoes', '3.5 kg @ 8.50SEK'),
+       (23, 'Fram', 1421, 'Peas', 'Thursday dinner'),
+       (24, 'Kals MatMarkn', 20650, 'Grilling', 'Friday dinner'),
+       (24, 'Kals MatMarkn', -2990, 'TP', 'None'),
+       (24, 'Kals MatMarkn', -2320, 'T-Gul', 'None')
+       ]
+
+print [t[2] for t in slips]
+print (reduce(lambda x, y: x+y, [t[2] for t in slips], 0))/900
+
+Pypy said: 603. Dinner for a week cost 603 Swedish Krona -- or approximately
+50$ US.  So if we can't have world domination, or get our Object Space
+to work, a new career in Sprint cost control beckons. :-)
+
+3.      The Translate Object Space
+
+The Translate Object Space is the next goal.  It is an example of an
+ObjectSpace that differs a lot from StandardObjectSpace.  We have to
+translate the Python code we have into C code. This is the sine qua
+non condition for our work to be actually usable. Quite unexpectedly,
+the major piece of the translator is itself an object space, the
+TranslateObjectSpace. Its goal is to run any Python code and produce C
+code in the background as it does so.
+
+Specifically, we take our PyPy interpreter with the Translate Object
+Space instead of the Standard Object Space, and run that, asking it to
+interpret our generated bytecode. A wrapped object is now the name of
+a variable in the C program we are emitting, for example:
+
+        The add method in the Translate Object Space takes two variable
+names, x and y, and emits the C code z = x + y; where z is a new variable
+name which is returned as the result of add. (We will  actually need to
+make the wrapped objects a bit more elaborate so that we can also record,
+besides the C variable name, its basic type).
+
+At the time of this writing, it is not clear whether this is too
+ambitious a goal for the Third Sprint, held in Louvain-la-Neuve,
+Belgium (near Brussels), June 21 - 24 .
+
+------------
+
+More details on how we actually do this stuff.
+
+A crucial concept is Multimethods  (yanked from the wiki)
+
+Interpreter-level classes correspond to implementations of
+application-level types.  The hierarchy among the classes used for the
+implementations is convenient for implementation purposes. It is not
+related to any application-level type hierarchy.
+
+Dispatch
+
+Multimethods dispatch by looking in a set of registered
+functions. Each registered function has a signature, which defines
+which object implementation classes are accepted at the corresponding
+argument position.
+
+The name 'W_ANY' is a synonym for 'W_Object' (currently, possibly
+'object' later). As it accepts anything, it is the only way to
+guarantee that the registered function will be called with exactly the
+same object as was passed originally. ATTENTION: in all other cases
+the argument received by the function may have been converted in some
+way. It must thus not be considered to be 'id'entical to the original
+argument. For example it should not be stored in a data structure, nor
+be queried for type, nor be used for another multimethod dispatch --
+the only thing you should do is read and write its internal data.
+
+For example, 'getattr(obj, attr)' is implemented with a W_StringObject
+second argument when all it needs is just the name of the attr, and
+with a W_ANY when the 'attr' object could be used as a key in
+obj.__dict__.
+
+Delegation
+
+Delegation is a transparent convertion mechanism between object
+implementations. The convertion can give a result of a different type
+(e.g. int -> float) or of the same type (e.g. W_VeryLongString ->
+str). There is a global table of delegators. We should not rely on the
+delegators to be tried in any particlar order, or at all (e.g. the int
+-> float delegator could be ignored when we know that no registered
+function will accept a float anyway).
+
+Delegation is also used to emulate inheritance between built-in types
+(e.g. bool -> int). This is done by delegation because there is no
+reason that a particular implementation of a sub-type can be trivially
+typecast to some other particular implementation of the parent type;
+the process might require some work.
+
+Types
+
+Types are implemented by the class W_TypeObject. This is where
+inheritance and the Method Resolution Order are defined, and where
+attribute look-ups are done.
+
+Instances of user-defined types are implementated as W_UserObjects. A
+user-defined type can inherit from built-in types (maybe more than
+one, although this is incompatible with CPython). The W_UserObject
+delegator converts the object into any of these "parent objects" if
+needed. This is how user-defined types appear to inherit all built-in
+operator implementations.
+
+Delegators should be able to invoke user code; this would let us
+implement special methods like __int__() by calling them within a
+W_UserObject -> int delegator.
+
+Specifics of multimethods
+
+Multimethods dispatch more-specific-first, left-to-right (i.e. if
+there is an exact match for the first argument it will always be tried
+first).
+
+Delegators are automatically chained (i.e. A -> B and B -> C would be
+combined to allow for A -> C delegation).
+
+Delegators do not publish the class of the converted object in
+advance, so that the W_UserObject delegator can potentially produce
+any other built-in implementation. This means chaining and chain loop
+detection cannot be done statically (at least without help from an
+analysis tool like the translator-to-C). To break loops, we can assume
+(unless a particular need arises) that delegators are looping when
+they return an object of an already-seen class.
+
+Registration
+
+The register() method of multimethods adds a function to its database
+of functions, with the given signature. A function that raises
+FailedToImplement causes the next match to be tried.
+
+'delegate' is the special unary multimethod that should try to convert
+its argument to something else. For greater control, it can also
+return a list of 2-tuples (class, object), or an empty list for
+failure to convert the argument to anything. All delegators will
+potentially be tried, and recursively on each other's results to do
+chaining.
+
+A priority ordering between delegators is used. See objspace.PRIORITY_*.
+
+Translation
+
+The code in multimethod.py is not supposed to be read by the
+translator-to-C. Special optimized code will be generated instead
+(typically some kind of precomputed dispatch tables).
+
+Delegation is special-cased too. Most delegators will be found to
+return an object of a statically known class, which means that most of
+the chaining and loop detection can be done in advance.
+
+Multimethod slicing
+
+Multimethods are visible to user code as (bound or unbound) methods
+defined for the corresponding types. (At some point built-in functions
+like len() and the operator.xxx() should really directly map to the
+multimethods themselves, too.)
+
+To build a method from a multimethod (e.g. as in 'l.append' or
+'int.__add__'), the result is actually a "slice" of the whole
+multimethod, i.e. a sub-multimethod in which the registration table
+has been trimmed down. (Delegation mechanisms are not restricted for
+sliced multimethods.)
+
+Say that C is the class the new method is attached to (in the above
+examples, respectively, C=type(l) and C=int). The restriction is based
+on the registered class of the first argument ('self' for the new
+method) in the signature. If this class corresponds to a fixed type
+(as advertized by 'statictype'), and this fixed type is C or a
+superclass of C, then we keep it.
+
+Some multimethods can also be sliced along their second argument,
+e.g. for __radd__().


More information about the Pypy-commit mailing list