[pypy-svn] rev 2112 - in pypy/trunk/doc: irclog translation

Tue Oct 28 18:21:29 CET 2003

Author: arigo
Date: Tue Oct 28 18:21:28 2003
New Revision: 2112

Added:
   pypy/trunk/doc/irclog/
   pypy/trunk/doc/irclog/annotations.txt
      - copied, changed from rev 2111, pypy/trunk/doc/translation/annotations.txt
   pypy/trunk/doc/irclog/llvm.txt
Removed:
   pypy/trunk/doc/translation/annotations.txt
Log:
Added another irc log.
Moved the previous one in a new directory 'doc/irclog'.
Added very short abstracts.


Copied: pypy/trunk/doc/irclog/annotations.txt (from rev 2111, pypy/trunk/doc/translation/annotations.txt)
==============================================================================

--- pypy/trunk/doc/translation/annotations.txt	(original)
+++ pypy/trunk/doc/irclog/annotations.txt	Tue Oct 28 18:21:28 2003
@@ -1,6 +1,11 @@
 About annotations
 =================
 
+We are running into limitations of the annotation system used for type inference.
+This document describes these limitations and how to slightly move the concepts
+around to fix them, and probably also how the whole issues occurred from having
+mixed concepts in wrong ways in the first place.
+
 Irc log from October, the 28th::
 
   <arigo> sanxiyn: ok for a few words about annotations?

Added: pypy/trunk/doc/irclog/llvm.txt
==============================================================================
--- (empty file)
+++ pypy/trunk/doc/irclog/llvm.txt	Tue Oct 28 18:21:28 2003
@@ -0,0 +1,167 @@
+LLVM
+====
+
+First discussion about using LLVM as a target language.
+LLVM (Low Level Virtual Machine) is a Compiler Infrastructure;
+see http://llvm.cs.uiuc.edu/.
+
+Irc log from October, the 28th::
+
+  <stackless>	So I smell something growing here...
+  <sanxiyn>	arigo: thanks. I lost some of up-logs... so I asked.
+  <arigo>	stackless: nice
+  <stackless>	on that assembly target: How is their source code? Had no time to look. I hope
+  <stackless>	they don't use huge ugly other languages like ML?
+  <sanxiyn>	stackless: good for you! I thank Richard Emslie, I thank Richard Emslie (he repeats)
+  <hpk>	arigo: uh, bob ippolito just wrote that LLVM is all C++
+  <arigo>	stackless: i doubt it
+  <sanxiyn>	Yep. LLVM is in C++.
+  <sanxiyn>	arigo: so logging & summary is for you (evil grin)
+  <arigo>	sanxiyn: yes
+  <arigo>	stackless: i think they are using fast custom back-ends for runtime code generation
+  <stackless>	arigo: that sounds like what I like.
+  <arigo>	stackless: they also mentioned grabbing parts of GCC
+  <hpk>	who bothers - we have to have some binding with C++ then :-)
+  <arigo>	stackless: or ideas and AST structures at least
+  <hpk>	but they seemed to like to move away from it (because of licensing issues)
+  <stackless>	well, they might like PyPy and decide to become part of the project, supporting us.
+  *	sanxiyn baffles, "ML is neither huge nor ugly!"
+  <arigo>	stackless: yes !
+  <arigo>	in all cases i think that a genllvm.py should be easy to write
+  <hpk>	right
+  <arigo>	and if their compilers are good it could be faster than C
+  *	stackless apologises, didn't mean ML, probably. But last time he looked into C--, he was unhappy to pull so much tings in...
+  <arigo>	because it has a lot of meta-information
+  <arigo>	not only types, but single-step-assignment guarantees no aliasing, whereas GCC tries hard to find out what could alias what
+  <stackless>	single-step-assignment is one thing I remember from C--
+  <arigo>	yes
+  <arigo>	it's a good idea
+  <stackless>	really good. They never have expressions in function calls.
+  <arigo>	and it's natural for intermediate languages like our flow graphs
+  <stackless>	Instead, order of evaluation is crystal clear.
+  <sanxiyn>	I think FlowModel has that property too...
+  <arigo>	yes
+  <sanxiyn>	since it's derived from Python bytecode... etc.
+  <arigo>	interesting stuff from the e-mail at http://mail.cs.uiuc.edu/pipermail/llvmdev/2003-October/000501.html
+  <arigo>	"programs which have high-degree basic blocks"
+  <arigo>	high-degree mean (unless i'm mistaken) a lot of inputargs
+  <arigo>	we have a lot of them indeed
+  <hpk>	yes!
+  <hpk>	that's what'
+  <arigo>	that's a problem when using languages like ML as intermediate languages
+  <arigo>	you can write functions with 23 arguments
+  <arigo>	but the compiler isn't optimized for that
+  <arigo>	i tried, it produces bad code :-)
+  <sanxiyn>	Ah, I heard it from Lisp gotchas, i.e. it's easier to write slow code in Lisp.
+  <sanxiyn>	(it specifically mentioned problem with multiple value return optimization. sounds similar.)
+  <sanxiyn>	btw, what is SSA...
+  <arigo>	single-step assignment (never write to a variable more than once)
+  <sanxiyn>	I'm not sure how does it help, but I don't know much about this area.
+  <hpk>	i a m just skimming the source code
+  <hpk>	looks nice and readable
+  <hpk>	and good inline documentation it seems
+  <hpk>	it is c++ though :-)
+  <sanxiyn>	arigo: Was thinking more about SpaceOp/Annset. It's a constraint-based programming.
+  <arigo>	yes, constrain propagation...
+  <sanxiyn>	That's what Screamer (sorry, don't know about others. this one is Lisp) do very well...
+  <sanxiyn>	Integer range analysis and all goodies.
+  *	hpk has not often seen such nice c++ code ...
+  <sanxiyn>	So it's not really a new idea. But that means we have lots of expereince to learn from.
+  <arigo>	sanxiyn: yes
+  *	sanxiyn loads screamer intro he downloaded but have never read.
+  <hpk>	hmmm, it's really a high level c++ code, probably pretty easy to convert to python (the parts i have seen)
+  <sanxiyn>	How much code is LLVM?
+  <hpk>	i have no idea
+  <hpk>	i just read the commit mails
+  <sanxiyn>	ls
+  <sanxiyn>	PyPy is currently 39844 lines of code.
+  <sanxiyn>	(22000 of them is PyPy, 16000 Pyrex.)
+  <hpk>	what? 
+  <hpk>	16000 pyrex? what do you mean? 
+  <sanxiyn>	Plex + Pyrex is 16000 lines.
+    
+    Oct 28 16:10:18 -->	pedronis (~sp at 91.51.202.62.dial.bluewin.ch) has joined #pypy
+  <sanxiyn>	Hello.
+  <pedronis>	hi
+  <hpk>	pedronis: hi samuele
+  <arigo>	hi samuele
+  *	sanxiyn downloads LLVM 1.0
+  <sanxiyn>	hpk: what do you think about line count? :)
+  *	arigo downloads LLVM 1.0 too
+  <pedronis>	why do we need to be so fast with  LLVM, is why they want to setup a public repo and we want to offer hosting it?
+  <sanxiyn>	we don't need to be hasty, right.
+  <sanxiyn>	hpk: eh. should I register to download?
+  <hpk>	i just did :-)
+  <arigo>	so did i :-)
+  <hpk>	with real name and all :-)
+  <sanxiyn>	me too.
+  <hpk>	pedronis: it cant hurt to contact them informally and see/talk about ideas i think
+  <sanxiyn>	well. it's *huge*;
+  <hpk>	pedronis: if we find out that we were over-enthusiatic we have not lost much, i think
+  <arigo>	pedronis: i think their project is interesting, for PyPy or not, and holger talked about offering hosting
+  <arigo>	pedronis: but mostly i'm sure if llvm is well written it is excellent for PyPy
+  <arigo>	pedronis: this needs to be checked and discussed of course
+  <pedronis>	arigo: what I'm not sure, and we should ask is how much they are interested in optimization for VHLL
+  <arigo>	as opposed to C-like languages ?
+  <pedronis>	arigo: yup, it seems that LLVM need to extended for thing like exact GC, or some possible lookup opts for VHLL
+  <arigo>	yes, i think the VHLL is supposed to do language-specific optimizations itself
+  *	sanxiyn metions Parrot... not.
+  <arigo>	and only emit a low-level code that contains enough information for good low-level optimization
+  <sanxiyn>	Parrot is the only explicitly VHLL VM I know of.
+  <pedronis>	arigo: it seems they are interested in things like region-based memory allocation etc
+  <arigo>	yes, which is fine i think
+  <pedronis>	arigo: which goes more in the device driver, OS kernel direction
+  <hpk>	quote: The Python test classes are more UNIX-centric than they should be, so porting to non-UNIX like platforms 
+  <arigo>	we can have refcounted regions and garbage-collected ones
+  <hpk>	(i thought it's interesting that they are using python for something :-)
+  <sanxiyn>	hpk: Many projects use Python for unittesting, but usually they have not much to do with Python.
+  <sanxiyn>	For example, svn uses Python for unittesting.
+  <hpk>	sanxiyn: sure, but it's still significant information 
+  <arigo>	pedronis: llvm is definitely a low-level tool
+  <hpk>	and BIND and whatnot
+  <sanxiyn>	Yes. It tells us they know about Python. :)
+  <pedronis>	arigo: yes, the point is whether they are happy extending it to support non-low-level stuff
+  <arigo>	pedronis: i'm thinking about it at least as a very good alternative to C for the translator
+  <arigo>	pedronis: but i think they would be happy to design some "hooks" needed for high-level languages
+  <arigo>	pedronis: they don't have Java yet for example but mention wanting to look in that direction
+  <pedronis>	arigo: OK, so using the their static compiler?
+  <arigo>	pedronis: at least
+  <arigo>	pedronis: we should try to write "genllvm.py"
+  <sanxiyn>	If RPython can be translated to C, it surely can be translated to LLVM.
+  <sanxiyn>	And moreover, as Psyco do (perhaps I'm wrong here), some Applevel Python function may be able to be JITted by (LLVM or whatever).
+  <arigo>	pedronis: i think the experiment is worth being made
+  <arigo>	sanxiyn: yes, that's what is beyond my "at least" :-)
+  <pedronis>	arigo: well the experiment is cheap
+  <sanxiyn>	arigo: Will you post log and summary for binding concept and forward-dependency, constraint-based programming?
+  <arigo>	sanxiyn: yes
+  <arigo>	pedronis: yes
+  <sanxiyn>	topic is moving farther and farther from that.
+  <arigo>	sanxiyn: i've saved the relevant parts, will edit them when i've a minute
+  <sanxiyn>	ah, ok.
+  <pedronis>	arigo: my issue is how much their JIT is usable and drivable at runtime, and intergation with things like GC etc
+  <pedronis>	arigo: OTOH yes as target of the translator, that another situation
+  <arigo>	pedronis: yes for the JIT it needs more investigation
+  <arigo>	pedronis: for full Psyco i'd need compilation of basic-blocks-at-a-time (not whole functions at a time)
+  <pedronis>	arigo: yes, I know that, is one of the thing I was wondering about
+  <sanxiyn>	I remeber Psyco does very complex things to accomplish that.
+  <arigo>	pedronis: right now i'm pretty enthusiastic because the LLVM language is just the same as our flowgraphs, so we could probably at least have a JIT for RPython
+  
+    Oct 28 16:31:22 -->	faassen (~faassen at a213-84-57-72.adsl.xs4all.nl) has joined #pypy
+  <arigo>	hi martijn
+  <pedronis>	arigo: yes or just static compilation
+  <pedronis>	arigo: it seems they are investigating trace-based techniques like Dynamo
+  <arigo>	pedronis: actually, i don't know many projects with a good runtime compiler that accepts an in-memory SSA representation of code
+  <faassen>	hey.
+  <arigo>	pedronis: this alone makes llvm interesting, for many projects that I can think about besides or on top of PyPy
+  <sanxiyn>	So LLVM is already a rare case?
+  <hpk>	what really impresses me is how their website and the source code is done
+  <hpk>	faassen: hi martijn
+  <faassen>	hpk: hey! :)
+  <arigo>	pedronis: trace techniques are nice, Psyco's profiler is a bit primitive
+  <sanxiyn>	website is impressive. I don't know C++ very well to judge the code. :(
+  <hpk>	sanxiyn: trust me it's better than average :-)
+  <faassen>	what website is that? :)
+  <arigo>	pedronis: at this point i think we should at least consider using llvm even if we have to change a bit the C++ code to add a couple of instructions.
+  <hpk>	http://llvm.cs.uiuc.edu/#subprojects
+    
+... cut at Martijn's arrival :-)

Deleted: /pypy/trunk/doc/translation/annotations.txt
==============================================================================
--- /pypy/trunk/doc/translation/annotations.txt	Tue Oct 28 18:21:28 2003
+++ (empty file)
@@ -1,169 +0,0 @@
-About annotations
-=================
-
-Irc log from October, the 28th::
-
-  <arigo> sanxiyn: ok for a few words about annotations?
-  <sanxiyn> yep!
-  <sanxiyn> (sorry for being out; I forgot it...)
-  <arigo> np
-  <arigo> mutable structures pose some problems
-  <sanxiyn> e.g.
-  <arigo> because you cannot say "len(x) = 5" if 'x' is a list, of course
-  <arigo> because the length of x could change
-  <arigo> so just propagating the annotation is wrong
-  <sanxiyn> ah.
-  <arigo> it's more annoying to say e.g. that x is a list of integers
-  <sanxiyn> Is it annoying?
-  <arigo> getitem(x, anything) = y & type(y) = int
-  <sanxiyn> yep.
-  <arigo> but what if you call f(x)
-  <arigo> and f adds strings to the list x ?
-  <sanxiyn> I think RPython list shall be homogenous.
-  <arigo> yes, but:
-  <arigo> x = []
-  <arigo> f(x)
-  <arigo> then f is allowed to put strings in x
-  <sanxiyn> ah, empty list thing...
-  <arigo> yes but also:
-  <arigo> x = ['hello']
-  <arigo> f(x)
-  <sanxiyn> ML languages have precisely the same problem, aren't they?
-  <arigo> yes but i think we can solve it here
-  <arigo> but we need to be careful
-  <sanxiyn> special casing empty list should work. (IIRC that's how it's done in ML, basically)
-  <arigo> yes but i think we can solve it here (didn't i say that already :-)
-  <sanxiyn> agreed. so let's solve it;
-  <sanxiyn> :)
-  <arigo> won't help verbosity, but let's think bout that later.
-  <sanxiyn> List length seems to be impossible to guarantee.
-  
-  <arigo> we can say:
-  <arigo> deref(x) = z ; getitem(z, anything) = y ; type(y) = int
-  <arigo> here x is our variable, but z is a Cell()
-  <arigo> so the list has a life of its own, independently from the variable it is in
-  * sanxiyn reads it carefully.
-  <arigo> what i'm thinking about is this:
-  <arigo> we would have (conceptually) a single big pool of annotation
-  <arigo> not one AnnotationSet per basic block
-  <arigo> only one, for the whole program
-  <sanxiyn> Yes. I found annset per block annoying, and felt that it's that way for no real reason.
-  <arigo> we would map variables to this big annotation set
-  <arigo> this must probably still be done for each block independently
-  <arigo> each block would have a map {variable: cell-in-the-big-annset}
-  <arigo> or maybe not
-  <sanxiyn> hm
-  <arigo> because variables are supposed to be unique anyway
-  <arigo> still, i think the big annset should not use variables at all, just cells and constants.
-  <sanxiyn> comments in get_variables_ann say otherwise, but I suspect it's outdated.
-  <arigo> "supposed" to be unique... no, they still aren't really
-  <sanxiyn> eh, confused.
-  <arigo> the comment is not outdated
-  <sanxiyn> what does it mean, then?
-  <arigo> the same Variable() is still used in several blocks
-  <arigo> that should be fixed
-  <sanxiyn> indeed.
-  <sanxiyn> I commented out XXX: variables must not be shared, and ran test_pyrextrans, and got 6 failures.
-  <arigo> yes
-  <arigo> all EggBlocks are wrong, currently
-  <sanxiyn> I don't know what Spam/Egg Blocks are.
-  <arigo> :-)
-  <sanxiyn> Don't know at all.
-  <arigo> it's funny names describing how the block was built
-  <arigo> they are all Blocks
-  <arigo> an EggBlock is used after a fork
-  <sanxiyn> fork?
-  <arigo> a split, after a block with two exits
-  <arigo> but that's not relevant to the other transformations
-  <arigo> which can simplify the graph after it is built
-  
-  <arigo> we could have a single big annset
-  <arigo> it represents "the heap" of an abstract CPython process
-  <sanxiyn> hm.
-  <arigo> i.e. objects in the heap
-  <arigo> like lists, integers, all of them
-  <arigo> using Cell() to represent abstract objects, and Constant() for concrete ones
-  <arigo> then a variable is only something which appears in the basic block's SpaceOperations
-  * arigo is confused
-  <sanxiyn> So Variable() points to Cell().
-  <arigo> yes...
-  <arigo> currently we cannot handle mutable lists because:
-  <arigo> getitem(x, *) = z
-  <arigo> is an annotation talking about the variable x
-  <arigo> so we cannot propagate the annotation forth and back to called sub-functions
-  <arigo> instead, getitem should talk about an object, not the variable that points to it
-  <sanxiyn> exactly!
-  <sanxiyn> That's Python-think. :)
-  <sanxiyn> http://starship.python.net/crew/mwh/hacks/objectthink.html
-  <sanxiyn> Is mwh's wonderful piece "How to think like a Pythonista" relevant here?
-  * arigo tries to do 4 things at the same times and fails to
-  <sanxiyn> So variables are names.
-  <sanxiyn> It binds.
-  <arigo> yes
-  <sanxiyn> mwh wrote: "I find the world variable to be particularly unhelpful in a Python context..."
-  <sanxiyn> with wonderful diagrams :)
-  <hpk> yah, introducing namespaces into abstract-interpretation world! :-)
-  <sanxiyn> namespace? eh, not exactly, I think...
-  <arigo> hpk: yes, each block is its own namespace here :-)
-  <arigo> and obviously we need "heap objects" that these names can refer to
-  <hpk> (namespaces in the meaning of "living" bindings between names and objects)
-  <sanxiyn> So "objects" are actually cells unless constant-propagated...
-  <arigo> yes...
-  <arigo> i think we could even go for a full-Prolog representation:
-  <arigo> the "big heap" contains cells and constants.  cells can become constants when we know more about them.
-  * sanxiyn should read Borges and Calvino as Martellibot suggested. :)
-  <arigo> seems cleaner than the current cell-variable-constant mix.
-  <arigo> in other words, a SpaceOperation uses variables only,
-  <arigo> and the variable can refer to a cell or a constant from the heap...
-  <arigo> the point is that the objects in the heap can be manipulated
-  <arigo> say a variable v1 points to a cell c
-  <arigo> with type(c) = list and len(c) = 3
-  <sanxiyn> v2 = v1 and v1 points to the same cell c.
-  <sanxiyn> you modify v2 and v1 is modified, too, etc.
-  <arigo> yes exactly
-  <arigo> if you append an item to the list then the annotation len(c) = 3 is deleted
-  <sanxiyn> Is "prolog" a pronoun for "non-determinism"?
-  <arigo> Logic Programming i think
-  
-  <sanxiyn> arigo: I think that solves "reflow".
-  <arigo> sanxiyn: yes, possibly
-  <arigo> you can add annotations freely, at least
-  <arigo> that's fine
-  <arigo> we'll just need a trick to delete ("retract") annotations
-  <arigo> because other annotations may depend on this one
-  <arigo> like type(c3)=int is only valid if type(c1)=int and type(c2)=int because we used an 'add' operation
-  <sanxiyn> Currently flowin does similar thing.
-  <sanxiyn> It recomputes all annotations if len(annset) is decreased.
-  <arigo> sanxiyn: yes, but it should work without the need to re-flowin
-  <sanxiyn> eh?
-  <sanxiyn> without re-flowin?
-  <arigo> if you delete an annotation, then you must recompute annotations recursively on the rest of the graph
-  <sanxiyn> yes, how to avoid that?
-  <arigo> we can record dependencies
-  <arigo> each annotation "knows" that it depends on some other ones
-  <hpk> question is if there are different ways of "depending" or just one way
-  <arigo> hpk: right
-  <hpk> in a way a space operation modifying the assertions denotes 'edges' in this dependency graph? 
-  <arigo> yes
-  <sanxiyn> I think annotation should know about *others* which depend on itself, not which itself depends on.
-  <arigo> yes
-  <arigo> when you kill an annotation, just follow the forward dependencies to kill the ones it depends on
-  <sanxiyn> So not dependency... reverse dependency? :)
-  <arigo> forward dependency... ?
-  <sanxiyn> Should be easy to add.
-  <hpk> "reasons"? 
-  <hpk> origin? 
-  <sanxiyn> hpk: no, consequences.
-  <sanxiyn> hpk: neither reason nor origin.
-  <arigo> "dependents" ?
-  <sanxiyn> As in SF novel "time patrol", if you change the past, the future is all changed.
-  <sanxiyn> how about consequences? I'm not good at naming...
-  <hpk> too long :-)
-  <sanxiyn> implication
-  <sanxiyn> too long ;
-  <arigo> consequences is fine if you don't have to type it too often :-)
-  <hpk> hmmm. 
-  <arigo> i guess we need an Annotation class whose constructor takes a list of dependencies, and records 'self' in these dependencies' "consequences" or whatever
-  <sanxiyn> I think only deletion routine need to refer it.
-  
-...cut. So if you have a good name for that attributes, speak up :-)