[pypy-svn] rev 2112 - in pypy/trunk/doc: irclog translation
arigo at codespeak.net
arigo at codespeak.net
Tue Oct 28 18:21:29 CET 2003
Author: arigo
Date: Tue Oct 28 18:21:28 2003
New Revision: 2112
Added:
pypy/trunk/doc/irclog/
pypy/trunk/doc/irclog/annotations.txt
- copied, changed from rev 2111, pypy/trunk/doc/translation/annotations.txt
pypy/trunk/doc/irclog/llvm.txt
Removed:
pypy/trunk/doc/translation/annotations.txt
Log:
Added another irc log.
Moved the previous one in a new directory 'doc/irclog'.
Added very short abstracts.
Copied: pypy/trunk/doc/irclog/annotations.txt (from rev 2111, pypy/trunk/doc/translation/annotations.txt)
==============================================================================
--- pypy/trunk/doc/translation/annotations.txt (original)
+++ pypy/trunk/doc/irclog/annotations.txt Tue Oct 28 18:21:28 2003
@@ -1,6 +1,11 @@
About annotations
=================
+We are running into limitations of the annotation system used for type inference.
+This document describes these limitations and how to slightly move the concepts
+around to fix them, and probably also how the whole issues occurred from having
+mixed concepts in wrong ways in the first place.
+
Irc log from October, the 28th::
<arigo> sanxiyn: ok for a few words about annotations?
Added: pypy/trunk/doc/irclog/llvm.txt
==============================================================================
--- (empty file)
+++ pypy/trunk/doc/irclog/llvm.txt Tue Oct 28 18:21:28 2003
@@ -0,0 +1,167 @@
+LLVM
+====
+
+First discussion about using LLVM as a target language.
+LLVM (Low Level Virtual Machine) is a Compiler Infrastructure;
+see http://llvm.cs.uiuc.edu/.
+
+Irc log from October, the 28th::
+
+ <stackless> So I smell something growing here...
+ <sanxiyn> arigo: thanks. I lost some of up-logs... so I asked.
+ <arigo> stackless: nice
+ <stackless> on that assembly target: How is their source code? Had no time to look. I hope
+ <stackless> they don't use huge ugly other languages like ML?
+ <sanxiyn> stackless: good for you! I thank Richard Emslie, I thank Richard Emslie (he repeats)
+ <hpk> arigo: uh, bob ippolito just wrote that LLVM is all C++
+ <arigo> stackless: i doubt it
+ <sanxiyn> Yep. LLVM is in C++.
+ <sanxiyn> arigo: so logging & summary is for you (evil grin)
+ <arigo> sanxiyn: yes
+ <arigo> stackless: i think they are using fast custom back-ends for runtime code generation
+ <stackless> arigo: that sounds like what I like.
+ <arigo> stackless: they also mentioned grabbing parts of GCC
+ <hpk> who bothers - we have to have some binding with C++ then :-)
+ <arigo> stackless: or ideas and AST structures at least
+ <hpk> but they seemed to like to move away from it (because of licensing issues)
+ <stackless> well, they might like PyPy and decide to become part of the project, supporting us.
+ * sanxiyn baffles, "ML is neither huge nor ugly!"
+ <arigo> stackless: yes !
+ <arigo> in all cases i think that a genllvm.py should be easy to write
+ <hpk> right
+ <arigo> and if their compilers are good it could be faster than C
+ * stackless apologises, didn't mean ML, probably. But last time he looked into C--, he was unhappy to pull so much tings in...
+ <arigo> because it has a lot of meta-information
+ <arigo> not only types, but single-step-assignment guarantees no aliasing, whereas GCC tries hard to find out what could alias what
+ <stackless> single-step-assignment is one thing I remember from C--
+ <arigo> yes
+ <arigo> it's a good idea
+ <stackless> really good. They never have expressions in function calls.
+ <arigo> and it's natural for intermediate languages like our flow graphs
+ <stackless> Instead, order of evaluation is crystal clear.
+ <sanxiyn> I think FlowModel has that property too...
+ <arigo> yes
+ <sanxiyn> since it's derived from Python bytecode... etc.
+ <arigo> interesting stuff from the e-mail at http://mail.cs.uiuc.edu/pipermail/llvmdev/2003-October/000501.html
+ <arigo> "programs which have high-degree basic blocks"
+ <arigo> high-degree mean (unless i'm mistaken) a lot of inputargs
+ <arigo> we have a lot of them indeed
+ <hpk> yes!
+ <hpk> that's what'
+ <arigo> that's a problem when using languages like ML as intermediate languages
+ <arigo> you can write functions with 23 arguments
+ <arigo> but the compiler isn't optimized for that
+ <arigo> i tried, it produces bad code :-)
+ <sanxiyn> Ah, I heard it from Lisp gotchas, i.e. it's easier to write slow code in Lisp.
+ <sanxiyn> (it specifically mentioned problem with multiple value return optimization. sounds similar.)
+ <sanxiyn> btw, what is SSA...
+ <arigo> single-step assignment (never write to a variable more than once)
+ <sanxiyn> I'm not sure how does it help, but I don't know much about this area.
+ <hpk> i a m just skimming the source code
+ <hpk> looks nice and readable
+ <hpk> and good inline documentation it seems
+ <hpk> it is c++ though :-)
+ <sanxiyn> arigo: Was thinking more about SpaceOp/Annset. It's a constraint-based programming.
+ <arigo> yes, constrain propagation...
+ <sanxiyn> That's what Screamer (sorry, don't know about others. this one is Lisp) do very well...
+ <sanxiyn> Integer range analysis and all goodies.
+ * hpk has not often seen such nice c++ code ...
+ <sanxiyn> So it's not really a new idea. But that means we have lots of expereince to learn from.
+ <arigo> sanxiyn: yes
+ * sanxiyn loads screamer intro he downloaded but have never read.
+ <hpk> hmmm, it's really a high level c++ code, probably pretty easy to convert to python (the parts i have seen)
+ <sanxiyn> How much code is LLVM?
+ <hpk> i have no idea
+ <hpk> i just read the commit mails
+ <sanxiyn> ls
+ <sanxiyn> PyPy is currently 39844 lines of code.
+ <sanxiyn> (22000 of them is PyPy, 16000 Pyrex.)
+ <hpk> what?
+ <hpk> 16000 pyrex? what do you mean?
+ <sanxiyn> Plex + Pyrex is 16000 lines.
+
+ Oct 28 16:10:18 --> pedronis (~sp at 91.51.202.62.dial.bluewin.ch) has joined #pypy
+ <sanxiyn> Hello.
+ <pedronis> hi
+ <hpk> pedronis: hi samuele
+ <arigo> hi samuele
+ * sanxiyn downloads LLVM 1.0
+ <sanxiyn> hpk: what do you think about line count? :)
+ * arigo downloads LLVM 1.0 too
+ <pedronis> why do we need to be so fast with LLVM, is why they want to setup a public repo and we want to offer hosting it?
+ <sanxiyn> we don't need to be hasty, right.
+ <sanxiyn> hpk: eh. should I register to download?
+ <hpk> i just did :-)
+ <arigo> so did i :-)
+ <hpk> with real name and all :-)
+ <sanxiyn> me too.
+ <hpk> pedronis: it cant hurt to contact them informally and see/talk about ideas i think
+ <sanxiyn> well. it's *huge*;
+ <hpk> pedronis: if we find out that we were over-enthusiatic we have not lost much, i think
+ <arigo> pedronis: i think their project is interesting, for PyPy or not, and holger talked about offering hosting
+ <arigo> pedronis: but mostly i'm sure if llvm is well written it is excellent for PyPy
+ <arigo> pedronis: this needs to be checked and discussed of course
+ <pedronis> arigo: what I'm not sure, and we should ask is how much they are interested in optimization for VHLL
+ <arigo> as opposed to C-like languages ?
+ <pedronis> arigo: yup, it seems that LLVM need to extended for thing like exact GC, or some possible lookup opts for VHLL
+ <arigo> yes, i think the VHLL is supposed to do language-specific optimizations itself
+ * sanxiyn metions Parrot... not.
+ <arigo> and only emit a low-level code that contains enough information for good low-level optimization
+ <sanxiyn> Parrot is the only explicitly VHLL VM I know of.
+ <pedronis> arigo: it seems they are interested in things like region-based memory allocation etc
+ <arigo> yes, which is fine i think
+ <pedronis> arigo: which goes more in the device driver, OS kernel direction
+ <hpk> quote: The Python test classes are more UNIX-centric than they should be, so porting to non-UNIX like platforms
+ <arigo> we can have refcounted regions and garbage-collected ones
+ <hpk> (i thought it's interesting that they are using python for something :-)
+ <sanxiyn> hpk: Many projects use Python for unittesting, but usually they have not much to do with Python.
+ <sanxiyn> For example, svn uses Python for unittesting.
+ <hpk> sanxiyn: sure, but it's still significant information
+ <arigo> pedronis: llvm is definitely a low-level tool
+ <hpk> and BIND and whatnot
+ <sanxiyn> Yes. It tells us they know about Python. :)
+ <pedronis> arigo: yes, the point is whether they are happy extending it to support non-low-level stuff
+ <arigo> pedronis: i'm thinking about it at least as a very good alternative to C for the translator
+ <arigo> pedronis: but i think they would be happy to design some "hooks" needed for high-level languages
+ <arigo> pedronis: they don't have Java yet for example but mention wanting to look in that direction
+ <pedronis> arigo: OK, so using the their static compiler?
+ <arigo> pedronis: at least
+ <arigo> pedronis: we should try to write "genllvm.py"
+ <sanxiyn> If RPython can be translated to C, it surely can be translated to LLVM.
+ <sanxiyn> And moreover, as Psyco do (perhaps I'm wrong here), some Applevel Python function may be able to be JITted by (LLVM or whatever).
+ <arigo> pedronis: i think the experiment is worth being made
+ <arigo> sanxiyn: yes, that's what is beyond my "at least" :-)
+ <pedronis> arigo: well the experiment is cheap
+ <sanxiyn> arigo: Will you post log and summary for binding concept and forward-dependency, constraint-based programming?
+ <arigo> sanxiyn: yes
+ <arigo> pedronis: yes
+ <sanxiyn> topic is moving farther and farther from that.
+ <arigo> sanxiyn: i've saved the relevant parts, will edit them when i've a minute
+ <sanxiyn> ah, ok.
+ <pedronis> arigo: my issue is how much their JIT is usable and drivable at runtime, and intergation with things like GC etc
+ <pedronis> arigo: OTOH yes as target of the translator, that another situation
+ <arigo> pedronis: yes for the JIT it needs more investigation
+ <arigo> pedronis: for full Psyco i'd need compilation of basic-blocks-at-a-time (not whole functions at a time)
+ <pedronis> arigo: yes, I know that, is one of the thing I was wondering about
+ <sanxiyn> I remeber Psyco does very complex things to accomplish that.
+ <arigo> pedronis: right now i'm pretty enthusiastic because the LLVM language is just the same as our flowgraphs, so we could probably at least have a JIT for RPython
+
+ Oct 28 16:31:22 --> faassen (~faassen at a213-84-57-72.adsl.xs4all.nl) has joined #pypy
+ <arigo> hi martijn
+ <pedronis> arigo: yes or just static compilation
+ <pedronis> arigo: it seems they are investigating trace-based techniques like Dynamo
+ <arigo> pedronis: actually, i don't know many projects with a good runtime compiler that accepts an in-memory SSA representation of code
+ <faassen> hey.
+ <arigo> pedronis: this alone makes llvm interesting, for many projects that I can think about besides or on top of PyPy
+ <sanxiyn> So LLVM is already a rare case?
+ <hpk> what really impresses me is how their website and the source code is done
+ <hpk> faassen: hi martijn
+ <faassen> hpk: hey! :)
+ <arigo> pedronis: trace techniques are nice, Psyco's profiler is a bit primitive
+ <sanxiyn> website is impressive. I don't know C++ very well to judge the code. :(
+ <hpk> sanxiyn: trust me it's better than average :-)
+ <faassen> what website is that? :)
+ <arigo> pedronis: at this point i think we should at least consider using llvm even if we have to change a bit the C++ code to add a couple of instructions.
+ <hpk> http://llvm.cs.uiuc.edu/#subprojects
+
+... cut at Martijn's arrival :-)
Deleted: /pypy/trunk/doc/translation/annotations.txt
==============================================================================
--- /pypy/trunk/doc/translation/annotations.txt Tue Oct 28 18:21:28 2003
+++ (empty file)
@@ -1,169 +0,0 @@
-About annotations
-=================
-
-Irc log from October, the 28th::
-
- <arigo> sanxiyn: ok for a few words about annotations?
- <sanxiyn> yep!
- <sanxiyn> (sorry for being out; I forgot it...)
- <arigo> np
- <arigo> mutable structures pose some problems
- <sanxiyn> e.g.
- <arigo> because you cannot say "len(x) = 5" if 'x' is a list, of course
- <arigo> because the length of x could change
- <arigo> so just propagating the annotation is wrong
- <sanxiyn> ah.
- <arigo> it's more annoying to say e.g. that x is a list of integers
- <sanxiyn> Is it annoying?
- <arigo> getitem(x, anything) = y & type(y) = int
- <sanxiyn> yep.
- <arigo> but what if you call f(x)
- <arigo> and f adds strings to the list x ?
- <sanxiyn> I think RPython list shall be homogenous.
- <arigo> yes, but:
- <arigo> x = []
- <arigo> f(x)
- <arigo> then f is allowed to put strings in x
- <sanxiyn> ah, empty list thing...
- <arigo> yes but also:
- <arigo> x = ['hello']
- <arigo> f(x)
- <sanxiyn> ML languages have precisely the same problem, aren't they?
- <arigo> yes but i think we can solve it here
- <arigo> but we need to be careful
- <sanxiyn> special casing empty list should work. (IIRC that's how it's done in ML, basically)
- <arigo> yes but i think we can solve it here (didn't i say that already :-)
- <sanxiyn> agreed. so let's solve it;
- <sanxiyn> :)
- <arigo> won't help verbosity, but let's think bout that later.
- <sanxiyn> List length seems to be impossible to guarantee.
-
- <arigo> we can say:
- <arigo> deref(x) = z ; getitem(z, anything) = y ; type(y) = int
- <arigo> here x is our variable, but z is a Cell()
- <arigo> so the list has a life of its own, independently from the variable it is in
- * sanxiyn reads it carefully.
- <arigo> what i'm thinking about is this:
- <arigo> we would have (conceptually) a single big pool of annotation
- <arigo> not one AnnotationSet per basic block
- <arigo> only one, for the whole program
- <sanxiyn> Yes. I found annset per block annoying, and felt that it's that way for no real reason.
- <arigo> we would map variables to this big annotation set
- <arigo> this must probably still be done for each block independently
- <arigo> each block would have a map {variable: cell-in-the-big-annset}
- <arigo> or maybe not
- <sanxiyn> hm
- <arigo> because variables are supposed to be unique anyway
- <arigo> still, i think the big annset should not use variables at all, just cells and constants.
- <sanxiyn> comments in get_variables_ann say otherwise, but I suspect it's outdated.
- <arigo> "supposed" to be unique... no, they still aren't really
- <sanxiyn> eh, confused.
- <arigo> the comment is not outdated
- <sanxiyn> what does it mean, then?
- <arigo> the same Variable() is still used in several blocks
- <arigo> that should be fixed
- <sanxiyn> indeed.
- <sanxiyn> I commented out XXX: variables must not be shared, and ran test_pyrextrans, and got 6 failures.
- <arigo> yes
- <arigo> all EggBlocks are wrong, currently
- <sanxiyn> I don't know what Spam/Egg Blocks are.
- <arigo> :-)
- <sanxiyn> Don't know at all.
- <arigo> it's funny names describing how the block was built
- <arigo> they are all Blocks
- <arigo> an EggBlock is used after a fork
- <sanxiyn> fork?
- <arigo> a split, after a block with two exits
- <arigo> but that's not relevant to the other transformations
- <arigo> which can simplify the graph after it is built
-
- <arigo> we could have a single big annset
- <arigo> it represents "the heap" of an abstract CPython process
- <sanxiyn> hm.
- <arigo> i.e. objects in the heap
- <arigo> like lists, integers, all of them
- <arigo> using Cell() to represent abstract objects, and Constant() for concrete ones
- <arigo> then a variable is only something which appears in the basic block's SpaceOperations
- * arigo is confused
- <sanxiyn> So Variable() points to Cell().
- <arigo> yes...
- <arigo> currently we cannot handle mutable lists because:
- <arigo> getitem(x, *) = z
- <arigo> is an annotation talking about the variable x
- <arigo> so we cannot propagate the annotation forth and back to called sub-functions
- <arigo> instead, getitem should talk about an object, not the variable that points to it
- <sanxiyn> exactly!
- <sanxiyn> That's Python-think. :)
- <sanxiyn> http://starship.python.net/crew/mwh/hacks/objectthink.html
- <sanxiyn> Is mwh's wonderful piece "How to think like a Pythonista" relevant here?
- * arigo tries to do 4 things at the same times and fails to
- <sanxiyn> So variables are names.
- <sanxiyn> It binds.
- <arigo> yes
- <sanxiyn> mwh wrote: "I find the world variable to be particularly unhelpful in a Python context..."
- <sanxiyn> with wonderful diagrams :)
- <hpk> yah, introducing namespaces into abstract-interpretation world! :-)
- <sanxiyn> namespace? eh, not exactly, I think...
- <arigo> hpk: yes, each block is its own namespace here :-)
- <arigo> and obviously we need "heap objects" that these names can refer to
- <hpk> (namespaces in the meaning of "living" bindings between names and objects)
- <sanxiyn> So "objects" are actually cells unless constant-propagated...
- <arigo> yes...
- <arigo> i think we could even go for a full-Prolog representation:
- <arigo> the "big heap" contains cells and constants. cells can become constants when we know more about them.
- * sanxiyn should read Borges and Calvino as Martellibot suggested. :)
- <arigo> seems cleaner than the current cell-variable-constant mix.
- <arigo> in other words, a SpaceOperation uses variables only,
- <arigo> and the variable can refer to a cell or a constant from the heap...
- <arigo> the point is that the objects in the heap can be manipulated
- <arigo> say a variable v1 points to a cell c
- <arigo> with type(c) = list and len(c) = 3
- <sanxiyn> v2 = v1 and v1 points to the same cell c.
- <sanxiyn> you modify v2 and v1 is modified, too, etc.
- <arigo> yes exactly
- <arigo> if you append an item to the list then the annotation len(c) = 3 is deleted
- <sanxiyn> Is "prolog" a pronoun for "non-determinism"?
- <arigo> Logic Programming i think
-
- <sanxiyn> arigo: I think that solves "reflow".
- <arigo> sanxiyn: yes, possibly
- <arigo> you can add annotations freely, at least
- <arigo> that's fine
- <arigo> we'll just need a trick to delete ("retract") annotations
- <arigo> because other annotations may depend on this one
- <arigo> like type(c3)=int is only valid if type(c1)=int and type(c2)=int because we used an 'add' operation
- <sanxiyn> Currently flowin does similar thing.
- <sanxiyn> It recomputes all annotations if len(annset) is decreased.
- <arigo> sanxiyn: yes, but it should work without the need to re-flowin
- <sanxiyn> eh?
- <sanxiyn> without re-flowin?
- <arigo> if you delete an annotation, then you must recompute annotations recursively on the rest of the graph
- <sanxiyn> yes, how to avoid that?
- <arigo> we can record dependencies
- <arigo> each annotation "knows" that it depends on some other ones
- <hpk> question is if there are different ways of "depending" or just one way
- <arigo> hpk: right
- <hpk> in a way a space operation modifying the assertions denotes 'edges' in this dependency graph?
- <arigo> yes
- <sanxiyn> I think annotation should know about *others* which depend on itself, not which itself depends on.
- <arigo> yes
- <arigo> when you kill an annotation, just follow the forward dependencies to kill the ones it depends on
- <sanxiyn> So not dependency... reverse dependency? :)
- <arigo> forward dependency... ?
- <sanxiyn> Should be easy to add.
- <hpk> "reasons"?
- <hpk> origin?
- <sanxiyn> hpk: no, consequences.
- <sanxiyn> hpk: neither reason nor origin.
- <arigo> "dependents" ?
- <sanxiyn> As in SF novel "time patrol", if you change the past, the future is all changed.
- <sanxiyn> how about consequences? I'm not good at naming...
- <hpk> too long :-)
- <sanxiyn> implication
- <sanxiyn> too long ;
- <arigo> consequences is fine if you don't have to type it too often :-)
- <hpk> hmmm.
- <arigo> i guess we need an Annotation class whose constructor takes a list of dependencies, and records 'self' in these dependencies' "consequences" or whatever
- <sanxiyn> I think only deletion routine need to refer it.
-
-...cut. So if you have a good name for that attributes, speak up :-)
More information about the Pypy-commit
mailing list