From dooms at info.ucl.ac.be  Tue Oct  5 14:32:31 2004
From: dooms at info.ucl.ac.be (=?ISO-8859-1?Q?Gr=E9goire_Dooms?=)
Date: Tue, 05 Oct 2004 14:32:31 +0200
Subject: [pypy-dev] [Fwd: CfP: Bytecode05, Edinburgh, Scotland, UK, April 9,
	2005]
Message-ID: <4162945F.1040106@info.ucl.ac.be>


                           Call for Papers
                              Bytecode05

                         The first Workshop on
     Bytecode Semantics, Verification, Analysis and Transformation

     April 9, 2005, Edinburgh, Scotland (co-located with ETAPS'05)
                   www.sci.univr.it/~spoto/Bytecode05


- Aims and Scope of the Workshop:

Bytecode, such as produced by e.g. Java and .NET compilers, has become a
topic of interest, both for industry and academia. The industrial
interest mainly stems from the fact that bytecode is typically used in
critical environments, such as the Internet and smart cards. Moreover,
an important characteristic for bytecode is that it is
device-independent and allows dynamic loading of classes. For
researchers that wish to apply formal methods to bytecode, this dynamic
nature of bytecode provides an extra challenge. In addition, also the
unstructuredness of the code and the pervasive presence of the stack
provide extra challenges for the analysis of bytecode.
This workshop will focus on the latest developments in the semantics,
verification, analysis and transformation of bytecode. Both new
theoretical results and tool demonstrations are welcome.


- Program Committee:

* Frederic Besson, IRISA, France
* Etienne Gagnon, Universite du Quebec a Montreal, Canada
* Marieke Huisman, INRIA Sophia Antipolis, France
* Fausto Spoto, Universita di Verona, Italy (chair)
* Don Syme, Microsoft Research, UK


- Invited Speaker:

* Xavier Leroy, INRIA Rocquencourt & Trusted Logic, France


- Important Dates:

* December 19, 2004: Paper submissions
* January 16, 2005: Notifications to authors
* January 30, 2005: Camera-ready
* April 9, 2005: Workshop


- Paper Submissions:

Submissions will be evaluated by the Program Committee for inclusion
in the proceedings, which will be available at the time of the workshop.
Papers should be no longer than 15 pages. They must contain original
contributions, be written in English and be unpublished and not
submitted simultaneously for publication elsewhere. They should be
submitted electronically, preferably as postscript or PDF files, to
fausto.spoto at univr.it providing also a text-only abstract, and detailed
contact information of the corresponding author. Proceedings are
intended to be published in the ENTCS series (information for authors
can be found at
www1.elsevier.com/gej-ng/31/29/23/show/Products/notes/ENTCS/guide.htt).
Thus, adhering to that style already in the submission phase is strongly
encouraged.


- Venue:

The workshop will be held in Edinburgh, Scotland, UK, and co-located
with ETAPS'05.


_________________________________________________________________________________
mozart-users mailing list                               mozart-users at ps.uni-sb.de
http://www.mozart-oz.org/mailman/listinfo/mozart-users

     You are invited to The 2nd International Mozart/Oz Conference (MOZ 2004)
Charleroi, Belgium, Oct. 7-8, 2004                    http://www.cetic.be/moz2004


From hpk at trillke.net  Sat Oct  9 18:18:08 2004
From: hpk at trillke.net (holger krekel)
Date: Sat, 9 Oct 2004 18:18:08 +0200
Subject: [pypy-dev] important codespeak news
Message-ID: <20041009161808.GK16212@solar.trillke.net>

Hi everyone using services on codespeak, 

first of all, please don't simply reply to this mail because
you may reply to a lot of mailing lists because this is a 
crossposting. However, it's probably the last crossposting. 
Why? See below. 

Jens-Uwe Mager and me are currently sitting on the new
codespeak hardware and software setup.  It will run on
subversion 1.1 among other things and we plan to introduce
proper SSL support, a trac based project environment, and more
redundancy.  Our first step, however, will be migrating the
hardware. 

Now we recommend that everybody who works with codespeak 
services subscribes to our new announce mailing list: 

    http://codespeak.net/mailman/listinfo/codespeak-ann 

This will from now on be the sole place where we announce news
- including downtimes and important upgrades or problems -
regarding codespeak.  It is expected to be a very low volume
list and it's readonly.

For developing and discussing infrastructure enhancements 
please subscribe to our development list, located here: 

    http://codespeak.net/mailman/listinfo/codespeak-dev 

This serves as the place to discuss introducing the trac
project management environment, and other general
improvements. We also welcome further help and suggestions and
we can rather easily grant sysadmin-privs because we have full
version control of the codespeak configuration files. 

For example, if you want to know more details regarding 
this announcements, you should subscribe to codespeak-dev 
and post your question. 

cheers, 

    Holger Krekel & Jens-Uwe Mager


From arigo at tunes.org  Mon Oct 11 14:46:39 2004
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 11 Oct 2004 13:46:39 +0100
Subject: [pypy-dev] Re: [pypy-sprint] vilnius sprint planning progress
In-Reply-To: <20041008101843.GB19383@solar.trillke.net>
References: <20041008101843.GB19383@solar.trillke.net>
Message-ID: <20041011124639.GA9880@vicky.ecs.soton.ac.uk>

Hi Holger,

On Fri, Oct 08, 2004 at 12:18:44PM +0200, holger krekel wrote:
> Armin, Samuele, it would be great if especially the two of you
> (or at least one of you) think hard about how to split up work
> for the translate-pypy goal so that multiple teams can work on
> it.  If you have ideas please post them to pypy-dev.  

As far as I can see now, this goal can be divided in three relatively
independent tasks:

A. Obtain the complete flow graph of the RPython part of PyPy.
B. Perform type inference and optimizations on the flow graph.
C. Generate C code from the flow graph.


A. PyPy -> FlowGraph
====================

See goal/translate_pypy.py; run it and fix things until this script
processes the whole of PyPy.

This requires updating things in PyPy when they are not RPythonic enough.
In particular, there are some more efforts to be done on "caches": lazily
built objects.  Generally, the code to build such objects is not RPython;
so for RPython, the objects must be built in advance.  The flow space must
force these caches to be completely built.  This part can be done
independently. The goal would be to get a complete graph, which the
existing genc.py can (mostly) already translate to C and run, for testing.
(This will not be extremely fast because genc.py doesn't use type
information now, but it should be a bit faster than the pure Python py.py.)


B. FlowGraph -> Optimized FlowGraph
===================================

Still open to discussion.  What exactly should be done here, and how?
An idea is that we could provide a set of rules that transform some
operations according to the type inferred for their arguments.  This would
introduce new operations that work on individual fields of PyObjects
instead of just calling PyObject_Xxx() on them.  Global analysis can
further discover when PyObject can be inlined into a parent, when they
don't need reference counters, etc.


C. Optimized FlowGraph -> C code
================================

See genc.py.  This not-too-long piece of code translates a regular flow
graph into C code, and it seems to work fine, mostly.  There are a few
missing RPython constructions (e.g. exception handling) that A will
generate.  In parallel, the optimizations introduced in B will produce
flow graphs with new kinds of operations whose support must then be added
to genc.py.  So if you'd like to work on genc.py, people working on A and
B will keep throwing at you new kind of flow graphs and optimization-related
data to support.


Armin


From hpk at trillke.net  Mon Oct 11 22:26:30 2004
From: hpk at trillke.net (holger krekel)
Date: Mon, 11 Oct 2004 22:26:30 +0200
Subject: [pypy-dev] Re: [pypy-sprint] vilnius sprint planning progress
In-Reply-To: <20041011124639.GA9880@vicky.ecs.soton.ac.uk>
References: <20041008101843.GB19383@solar.trillke.net>
	<20041011124639.GA9880@vicky.ecs.soton.ac.uk>
Message-ID: <20041011202630.GB15456@solar.trillke.net>

Hi Armin, 

[Armin Rigo Mon, Oct 11, 2004 at 01:46:39PM +0100]
> A. PyPy -> FlowGraph
> ====================
> 
> See goal/translate_pypy.py; run it and fix things until this script
> processes the whole of PyPy.
> 
> This requires updating things in PyPy when they are not RPythonic enough.
> In particular, there are some more efforts to be done on "caches": lazily
> built objects.  Generally, the code to build such objects is not RPython;
> so for RPython, the objects must be built in advance.  The flow space must
> force these caches to be completely built.  This part can be done
> independently. 

Because of the current way 'genc.py' is done we can mostly
forget about "module completeness" for this goal, right? But
for example, we need to properly get stdout/file interaction
working otherwise we will never get any output from our first
translated PyPy/C version. I am not quite clear on how far the
current genc.py goes in letting us use the CPython runtime.

> The goal would be to get a complete graph, which the
> existing genc.py can (mostly) already translate to C and run, for testing.

Well, i consider the how-to-do-exceptions question still a major 
problem. 

> B. FlowGraph -> Optimized FlowGraph
> ===================================
> 
> Still open to discussion.  What exactly should be done here, and how?
> An idea is that we could provide a set of rules that transform some
> operations according to the type inferred for their arguments.  This would
> introduce new operations that work on individual fields of PyObjects
> instead of just calling PyObject_Xxx() on them.  Global analysis can
> further discover when PyObject can be inlined into a parent, when they
> don't need reference counters, etc.

Actually the latter sounds to me like there must already be some 
algorithms out there that do it.  Maybe we should invite Donald
Knuth to one of our sprints, anyway :-)   More seriously, though, 
it would be helpful to get some gcc or other compiler people to
one of our next sprints to give a lecture and help doing flowgraph 
transformations. 
 
> C. Optimized FlowGraph -> C code
> ================================
> 
> See genc.py.  This not-too-long piece of code translates a regular flow
> graph into C code, and it seems to work fine, mostly.  There are a few
> missing RPython constructions (e.g. exception handling) that A will
> generate.  In parallel, the optimizations introduced in B will produce
> flow graphs with new kinds of operations whose support must then be added
> to genc.py.  So if you'd like to work on genc.py, people working on A and
> B will keep throwing at you new kind of flow graphs and optimization-related
> data to support.

We also need to think about ways to test this.  This is most often currently 
done by transparently compiling the generated C file and calling functions
in it to see if they produce the expected result.  However, for the
optimizations we should write test in a more fine grained way, feeding
it graphs and checking the resulting graphs.  Otherwise we will over time
get subtle errors and segmentation faults :-) 

Thanks for the good description already.  I have started a wiki page 

    http://codespeak.net/moin/pypy/moin.cgi/VilniusSprintTasks

where we should try to put the basic tasks in, the more fine 
grained the better.  As soon as i know the exact sprint dates 
we should sent out a real sprint announcement and finish the 
web pages for it. 

cheers, 

    holger


From hpk at trillke.net  Thu Oct 14 17:06:31 2004
From: hpk at trillke.net (holger krekel)
Date: Thu, 14 Oct 2004 17:06:31 +0200
Subject: [pypy-dev] PyPy Vilnius Sprint 15-23 nov 2004
Message-ID: <20041014150631.GO15456@solar.trillke.net>

Hi Pythonistas and interested developers,

PyPy, the python-in-python implementation, is steadily moving
on.  The next coding sprint will take place in Vilnius,
Lithunia, from

    15th to 23rd of November, 2004

and is organized by the nice Programmers of Vilnius (POV)
company.  See http://codespeak.net/pypy/index.cgi?doc for more
in-depth information about PyPy.

Again, we will be heading towards a first generated C version
of our already pretty compliant Python interpreter and types
implementation.  Last time, before the EuroPython 2004
conference, we actually had a similar goal (a PyPy/C-version) but
discovered we had to largely refactor the basic model for
attribute accesses. We are now closely mirroring the marvelous
"descriptor"-mechanism of CPython.   

If you are interested to participate in our fun and somewhat
mind-altering python sprint event then please subscribe at

  http://codespeak.net/moin/pypy/moin.cgi/VilniusSprintAttendants

and look around for more information. You'll find that most of
the core PyPy developers are already determined to come. There
are also many areas that need attention so that we should
have tasks suited for different levels of expertise. 

At http://codespeak.net/moin/pypy/moin.cgi/VilniusSprintTasks
you'll find our sprint planning task list which will probably 
grow in the next weeks.

Note that our EU funding efforts are at the final stage now.
In the next weeks, quite likely before the Vilnius sprint, we
_hope_ to get a "go!" from the european commission.  One side
effect would be that coders - probably restricted to european
citizens - may generally apply for getting travel and
accomodation costs refunded for PyPy sprints.  This would
reduce the barrier of entry to the question if you like to
spend your time with a pypy sprint.  However, we probably need
some time to work out the details after when we get more
information from the EU. 

If you have any questions don't hesitate to contact
pypy-sprint at codespeak.net or one of us personally.

cheers & a bientot,

    Holger Krekel, Armin Rigo 


From arigo at tunes.org  Mon Oct 18 11:19:30 2004
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 18 Oct 2004 10:19:30 +0100
Subject: [pypy-dev] Re: [pypy-sprint] vilnius sprint planning progress
In-Reply-To: <20041011202630.GB15456@solar.trillke.net>
References: <20041008101843.GB19383@solar.trillke.net>
	<20041011124639.GA9880@vicky.ecs.soton.ac.uk>
	<20041011202630.GB15456@solar.trillke.net>
Message-ID: <20041018091930.GA22093@vicky.ecs.soton.ac.uk>

Hi Holger,

I don't know why, but your answer never reached pypy-dev (as far as I can 
tell) so I missed it.  Sorry for the delay.  I will answer with extensive 
quotes, in case nobody else received it either.

On Mon, Oct 11, 2004 at 10:26:30PM +0200, holger krekel wrote:
> > A. PyPy -> FlowGraph
> > ====================
> 
> Because of the current way 'genc.py' is done we can mostly
> forget about "module completeness" for this goal, right? But
> for example, we need to properly get stdout/file interaction
> working otherwise we will never get any output from our first
> translated PyPy/C version. I am not quite clear on how far the
> current genc.py goes in letting us use the CPython runtime.

This is not a problem.  The extension module generated by genc.py manipulates
PyObject* pointers borrowed from CPython.  In this respect genc.py is exactly
like the old Python2C: if our quasi-RPython code currently uses 'file' or
other borrowed objects then what you get in the C code is a lot of calls like
PyObject_Call(&PyFile_Type) and PyObject_GetAttrString(f, "read").

> > The goal would be to get a complete graph, which the
> > existing genc.py can (mostly) already translate to C and run, for testing.
> 
> Well, i consider the how-to-do-exceptions question still a major 
> problem.

It wasn't that complex.  I did it in a recent check-in :-)  Of course, it's
cheating -- it uses CPython's exception mechanisms and conventions like
returning NULL in case of error.  But it's just fine for now.

> > B. FlowGraph -> Optimized FlowGraph
> > ===================================
>
> Actually the latter sounds to me like there must already be some 
> algorithms out there that do it.  Maybe we should invite Donald
> Knuth to one of our sprints, anyway :-)   More seriously, though, 
> it would be helpful to get some gcc or other compiler people to
> one of our next sprints to give a lecture and help doing flowgraph 
> transformations. 

Yes, there are probably existing techniques out there.  However, I've compared
with what common compilers do for various languages and I can't really find
anything similar.  You have on the one hand languages like C++ or Java where
when the programmer declares a structures (or class), the compiler really puts
a structure in the heap, and cannot do things like inlining it automatically.
In C++ you can inline it yourself: e.g. to inline a structure B in a structure 
A you would declare A as

struct A {    // or class A
  B fieldname;
};

as opposed to 'B* fieldname' when you just want a pointer to the structure.  
In Java you cannot inline structures at all, all class instances are allocated 
in the heap.

You have another class of programming languages which have more lightweight
structures than Java: the functionnal languages.  But there, all data is
immutable, so it is (more or less) irrelevant if two pointers point to the
same structure or to two equal copies.  This leads the common compilers to
follow a very different memory model.

An exception is maybe the Mercury language.  Here, you have use type
declarations to constrain the usage of structures: for example, you can say
that a function's argument, whose type is "pointer to a struct of type A",
should receive the last pointer in existence to that structure.  This allows
the compiler to optimize the function.  Remember that you cannot modify
anything in these languages, so it is typical to have functions that take a
(big) structure as input argument, and return a copy of the same (big)  
structure with just a couple of fields modified.  In this case, if the input
structure is declared as "you have the last pointer to it", then the compiler
can actually modify the couple of fields of structure in-place and return
that, instead of having to copy it completely first, because we know that
nobody else can notice that we have actually modified an existing structure
(which is forbidden in the language!).

But again this is a bit different.  Moreover in Mercury the programmer must
provide the "uniqueness" annotations; they are not computed algorithmically.


Something else, I now know how to write down the graph transformation rules,
in particular the new flow graph that should replace an operation.  Forget a
new textual syntax for flow graphs, with a parser and everything.  We can use
the plain RPython syntax and build flow graphs with the FlowObjSpace.  For 
example:

def inplace_add__List(lst, iterable):
    for x in iterable:
        lst.append(x)
    return lst

is a clear way to express that the operation 'inplace_add', when the first
argument is a list, should be replaced by (the flow graph of) the body of this
function.

> > C. Optimized FlowGraph -> C code
> > ================================
> 
> We also need to think about ways to test this.  This is most often currently 
> done by transparently compiling the generated C file and calling functions
> in it to see if they produce the expected result.  However, for the
> optimizations we should write test in a more fine grained way, feeding
> it graphs and checking the resulting graphs.  Otherwise we will over time
> get subtle errors and segmentation faults :-) 

Indeed.  Testing that flow graphs themselves are "right" is difficult.  There
is now a checkgraph() function in pypy.objspace.flow.model which checks that
there is no structural error in the flow graph; for the semantics, maybe we
should use Richard's flow graph interpreter, feeding it sample inputs and
checking that we get the correct output.  That's better than compiling because
it doesn't involve so much code, and we don't get segfaults if something goes
wrong :-)


Armin


From hpk at trillke.net  Mon Oct 18 11:47:40 2004
From: hpk at trillke.net (holger krekel)
Date: Mon, 18 Oct 2004 11:47:40 +0200
Subject: [pypy-dev] Re: [pypy-sprint] vilnius sprint planning progress
In-Reply-To: <20041018091930.GA22093@vicky.ecs.soton.ac.uk>
References: <20041008101843.GB19383@solar.trillke.net>
	<20041011124639.GA9880@vicky.ecs.soton.ac.uk>
	<20041011202630.GB15456@solar.trillke.net>
	<20041018091930.GA22093@vicky.ecs.soton.ac.uk>
Message-ID: <20041018094740.GP15456@solar.trillke.net>

Hi Armin, 

[Armin Rigo Mon, Oct 18, 2004 at 10:19:30AM +0100]
> Hi Holger,
> 
> I don't know why, but your answer never reached pypy-dev (as far as I can 
> tell) so I missed it.  Sorry for the delay.  I will answer with extensive 
> quotes, in case nobody else received it either.

the wonders of email systems and spam filtering at work i guess ... 
the email is in the pypy-dev archive, though. 
 
> On Mon, Oct 11, 2004 at 10:26:30PM +0200, holger krekel wrote:
> > > A. PyPy -> FlowGraph
> > > ====================
> > 
> > Because of the current way 'genc.py' is done we can mostly
> > forget about "module completeness" for this goal, right? But
> > for example, we need to properly get stdout/file interaction
> > working otherwise we will never get any output from our first
> > translated PyPy/C version. I am not quite clear on how far the
> > current genc.py goes in letting us use the CPython runtime.
> 
> This is not a problem.  The extension module generated by genc.py manipulates
> PyObject* pointers borrowed from CPython.  In this respect genc.py is exactly
> like the old Python2C: if our quasi-RPython code currently uses 'file' or
> other borrowed objects then what you get in the C code is a lot of calls like
> PyObject_Call(&PyFile_Type) and PyObject_GetAttrString(f, "read").

ok, i was more refering to the fact that 'file' currently is a "faked type"
and i am not sure if that translates properly.  But given that we just
use the underlying cpython runtime API it should work. 
 
> > > The goal would be to get a complete graph, which the
> > > existing genc.py can (mostly) already translate to C and run, for testing.
> > 
> > Well, i consider the how-to-do-exceptions question still a major 
> > problem.
> 
> It wasn't that complex.  I did it in a recent check-in :-)  Of course, it's
> cheating -- it uses CPython's exception mechanisms and conventions like
> returning NULL in case of error.  But it's just fine for now.

ah ok.  Are exceptions represented in the flow graph already? 
yes, i should read the code some more :-) 

> > > B. FlowGraph -> Optimized FlowGraph
> > > ===================================
 
> [some overview of what  other languages with respect to inlining]
> ... 
> Something else, I now know how to write down the graph transformation rules,
> in particular the new flow graph that should replace an operation.  Forget a
> new textual syntax for flow graphs, with a parser and everything.  We can use
> the plain RPython syntax and build flow graphs with the FlowObjSpace.  For 
> example:
> 
> def inplace_add__List(lst, iterable):
>     for x in iterable:
>         lst.append(x)
>     return lst
> 
> is a clear way to express that the operation 'inplace_add', when the first
> argument is a list, should be replaced by (the flow graph of) the body of this
> function.

good idea.  Obviously, we want to avoid cyclic transformation rules. 

cheers & looking forward to the sprint, 

    holger


P.S.: i have set your date on the sprint wiki pages, please check 
      http://codespeak.net/moin/pypy/moin.cgi/VilniusSprintAttendants
      i plan to arrive early 13th or even the 12th of Nov if there
      is no early flight on 13th. And i am going to invite everyone
      who is there on 14th of nov (my birthday) to a couple of beers 
      or whatevers :-) 


From ALE at bang-olufsen.dk  Tue Oct 19 16:50:11 2004
From: ALE at bang-olufsen.dk (Anders Lehmann)
Date: Tue, 19 Oct 2004 16:50:11 +0200
Subject: [pypy-dev] C keywords as parameters
Message-ID: <OF8C9A5E39.80847E48-ONC1256F32.00502BDE-C1256F32.00518046@bang-olufsen.dk>

I had some trouble getting /goal/translate_pypy to work for me today. 

The compiler barfed over :


struct __pyx_obj_11entry_point_W_Root__8329f0 {
  PyObject_HEAD
  PyObject *typedef;
};

The errors disapear if I insert the following
 
        pyxcode=pyxcode.replace('typedef','type_def')
 
into line 219 of translalor.py (in the function compile). 

I wont check in the change as I dont think this is the right solution.

Hope you will have a good sprint in Vilnius (I wont be able to make alas)

Anders Lehmann


From arigo at tunes.org  Thu Oct 21 11:56:39 2004
From: arigo at tunes.org (Armin Rigo)
Date: Thu, 21 Oct 2004 10:56:39 +0100
Subject: [pypy-dev] C keywords as parameters
In-Reply-To: <OF8C9A5E39.80847E48-ONC1256F32.00502BDE-C1256F32.00518046@bang-olufsen.dk>
References: <OF8C9A5E39.80847E48-ONC1256F32.00502BDE-C1256F32.00518046@bang-olufsen.dk>
Message-ID: <20041021095639.GA12305@vicky.ecs.soton.ac.uk>

Hi Anders,

On Tue, Oct 19, 2004 at 04:50:11PM +0200, Anders Lehmann wrote:
> struct __pyx_obj_11entry_point_W_Root__8329f0 {
>   PyObject_HEAD
>   PyObject *typedef;
> };

Right, I remember that it was one of the annoyances with Pyrex that eventually
lead me to consider how difficult (or not) it would be to write C code
directly instead...

>        pyxcode=pyxcode.replace('typedef','type_def')
> I wont check in the change as I dont think this is the right solution.

If someone wants to fix (or even report) the problem in Pyrex, he's welcome.  
For now I think we can start focusing on the C backend and come back to Pyrex
later (if at all).

> Hope you will have a good sprint in Vilnius (I wont be able to make alas)

Thanks!  Hope to see you again in a future sprint,

Armin


From arigo at tunes.org  Sun Oct 31 12:07:45 2004
From: arigo at tunes.org (Armin Rigo)
Date: Sun, 31 Oct 2004 11:07:45 +0000
Subject: [pypy-dev] More on optimization
Message-ID: <20041031110745.GA25406@vicky.ecs.soton.ac.uk>

Hi!

Sorry for focusing the next sprint so much on translation.  This might have
put some people off.  Well, lesson learned.

It doesn't mean we should stop talking about translation :-)  Pushing the
previously dicussed ideas to their conclusion, we get an interesting point of
view...

First, for the context, the current situation in the repository:  RPython code
can be turned into a flow graph.  Then the annotation pass on the flow graph
infers types, with the side effect that it infers which functions call which
functions, so it helps to build the complete list of functions that should be
translated.  Finally, genc.py produces naive C code using PyObject* only.  
The inferred types are not really used; the annotation phase is currently
useful only to build the complete call graph.  (Let's ignore the Lisp and
Pyrex generators for the moment.)

Here is an example with the flow graph's operations on the left and the
corresponding C code (after macro expansion) on the right:

  v3 = add(v1, v2)            v3 = PyNumber_Add(v1, v2);

Quite obvious, and quite slow.  Imagine instead that there is a C-ish
low-level language, whose input "syntax" is of course flow graphs, with only
operations like the following ones:

* llcall(func, arg1, arg2...)   # calls another function
* lladd_int(x, y)               # adds two ints and returns an int
* lleq_ptr(x, y)                # compares two pointers for equality
* llstruct('name')              # creates an empty structure in the heap
* llget_int(x, 'key')           # reads the field 'key' of struct object x
* llset_int(x, 'key', value)    # writes the field 'key' of struct object x
* llget_ptr(x, 'key')           # \
* llget_ptr(x, 'key', value)    # / the same with fields containing a pointer

The only data types would be "pointer to structure" and the atomic types like
"int" and "char".  A structure would be essentially like a dictionary, with
either strings or integers as keys (a dict with int keys is much like a list,
without the ability to insert or remove elements easily).

This very limited set of operations allows interesting global analysis and
optimizations, like removing the allocation of structures in the heap
altogether, or not writing some fields of some structures when they have a
known constant value, or (most commonly) not using a hash table but just a 'C
struct' with fields when all the keys are known.

It is easy to modify our regular flow graphs until they only use the above
operations: we replace high-level operations like 'add' with calls to (or
inlined copies of) support functions like:

def rpyhon_add(v, w):
    if type(v) is int and type(w) is int:
        a = llget_int(v, 'ob_ival')
        b = llget_int(w, 'ob_ival')
        x = llstruct()
        llset(x, 'ob_type', int)
        llset(x, 'ob_ival', lladd_int(a, b))
        return x
    else:
        return llcall('PyNumber_Add', v, w)

Only such support functions can use the ll*() functions, which stand directly
for the corresponding ll* operation.  The ll*() functions themselves are not
executable in the above Python source (actually they don't have to be callable
objects at all).  They are only meant as placeholders.  If you wish, it's just
an easy way to write down a complicated flow graph using ll* operations:
instead of inventing a syntax and writing a parser for it, we use the Python
syntax and the flowobjspace to build our flow graphs.  Note that the
type() used above is a function defined as:

def type(v):
    return llget_ptr(v, 'ob_type')

(Actually, we might want to directly execute such code, for testing, with
Python dictionaries in most variables and (say) llget_ptr() defined as
dict.__getitem__(), but that's not the main goal here.)


Let's come back to PyPy.  We can now replace each high-level operation with a
call to a support function like the above one.  Initially these support
functions could contain only the llcall() to the CPython API; then we can
progressively add more cases to cover all the RPython behavior.  The goal
would be that eventually the support functions and the optimizer become good
enough to remove all calls to the CPython API.  The optimizer can do that
using global analysis: e.g. in the above rpython_add(), if both 'v' and 'w'
come from a previous llstruct() where 'ob_type' was set to 'int', then we know
that the first branch is taken.  This is a kind of generalized type inference
using global constant propagation; it would superscede half of the annotation
pass' job.

Also, the incremental nature of this approach is nice (and very Psyco-like,
btw).  It is good for testing along the way, and also means that when we reach
the goal of supporting enough for RPython, the result is still nicely useful
for non-RPython code; in this case, when types cannot be inferred, there are
remaining calls to the CPython API.  I think that this would be a widely
useful tool!

I also like a lot the idea of a low-level language and optimizer which is
entierely independent from Python and PyPy.  We could even (still) devise an
assembly-like syntax and a parser for it, and make it a separate release.  


Such an optimizer is obviously not easy to write but I think it's quite worth
the effort.  Most of it would be independent on the target language: it would
e.g. replace operations with still more precise operations describing the
usage of the structs, like

   llheapalloc('C struct name')
   llheapget_int(x, 'field', 'C struct name')

Also, we should look closely for related work.  It looks like it *has to* have
been done somewhere before...  But the kind of global optimizations that we
need might be different from the ones any reasonable language would need,
because no one would reasonably write code like the rpython_add() above when
he actually means to do a simple integer addition :-)  What's essential here,
after constant propagation, is to do liveness and alias analysis for the heap
structures, so that they can be not allocated in the heap at all as often as
possible.  Also, if we want to do explicit memory management (i.e. we would
have to use a 'llfree' operation) with reference counters, then the above
example of rpython_add() becomes even longer -- but of course it should still
be optimized away.


A bient?t,

Armin.


From pedronis at bluewin.ch  Sun Oct 31 16:42:33 2004
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Sun, 31 Oct 2004 16:42:33 +0100
Subject: [pypy-dev] More on optimization
In-Reply-To: <20041031110745.GA25406@vicky.ecs.soton.ac.uk>
References: <20041031110745.GA25406@vicky.ecs.soton.ac.uk>
Message-ID: <418507E9.60403@bluewin.ch>

Armin Rigo wrote:
> Hi!
> 
> Sorry for focusing the next sprint so much on translation.  This might have
> put some people off.  Well, lesson learned.
> 
> It doesn't mean we should stop talking about translation :-)  Pushing the
> previously dicussed ideas to their conclusion, we get an interesting point of
> view...
> 
> First, for the context, the current situation in the repository:  RPython code
> can be turned into a flow graph.  Then the annotation pass on the flow graph
> infers types, with the side effect that it infers which functions call which
> functions, so it helps to build the complete list of functions that should be
> translated.

yes, given that RPython is still OO, call-graph construction and type 
inference are related.

>  Finally, genc.py produces naive C code using PyObject* only.  
> The inferred types are not really used;

I still hope that for less C-like languages as target, the annotated 
graph is still directly useful.

> the annotation phase is currently
> useful only to build the complete call graph.  (Let's ignore the Lisp and
> Pyrex generators for the moment.)
> 
> Here is an example with the flow graph's operations on the left and the
> corresponding C code (after macro expansion) on the right:
> 
>   v3 = add(v1, v2)            v3 = PyNumber_Add(v1, v2);
> 
> Quite obvious, and quite slow.  Imagine instead that there is a C-ish
> low-level language, whose input "syntax" is of course flow graphs, with only
> operations like the following ones:
> 
> * llcall(func, arg1, arg2...)   # calls another function
> * lladd_int(x, y)               # adds two ints and returns an int
> * lleq_ptr(x, y)                # compares two pointers for equality
> * llstruct('name')              # creates an empty structure in the heap
> * llget_int(x, 'key')           # reads the field 'key' of struct object x
> * llset_int(x, 'key', value)    # writes the field 'key' of struct object x
> * llget_ptr(x, 'key')           # \
> * llget_ptr(x, 'key', value)    # / the same with fields containing a pointer
> 
> The only data types would be "pointer to structure" and the atomic types like
> "int" and "char".  A structure would be essentially like a dictionary, with
> either strings or integers as keys (a dict with int keys is much like a list,
> without the ability to insert or remove elements easily).
> 
> This very limited set of operations allows interesting global analysis and
> optimizations, like removing the allocation of structures in the heap
> altogether, or not writing some fields of some structures when they have a
> known constant value, or (most commonly) not using a hash table but just a 'C
> struct' with fields when all the keys are known.
> 
> It is easy to modify our regular flow graphs until they only use the above
> operations: we replace high-level operations like 'add' with calls to (or
> inlined copies of) support functions like:
> 
> def rpyhon_add(v, w):
>     if type(v) is int and type(w) is int:
>         a = llget_int(v, 'ob_ival')
>         b = llget_int(w, 'ob_ival')
>         x = llstruct()
>         llset(x, 'ob_type', int)
>         llset(x, 'ob_ival', lladd_int(a, b))
>         return x
>     else:
>         return llcall('PyNumber_Add', v, w)
> 
> Only such support functions can use the ll*() functions, which stand directly
> for the corresponding ll* operation.  The ll*() functions themselves are not
> executable in the above Python source (actually they don't have to be callable
> objects at all).  They are only meant as placeholders.  If you wish, it's just
> an easy way to write down a complicated flow graph using ll* operations:
> instead of inventing a syntax and writing a parser for it, we use the Python
> syntax and the flowobjspace to build our flow graphs.  Note that the
> type() used above is a function defined as:
> 
> def type(v):
>     return llget_ptr(v, 'ob_type')
> 

it seems, along this path we forget the assumption that int are 
primitive in RPython, to get that info back through analysis. Just to 
point this out.

> (Actually, we might want to directly execute such code, for testing, with
> Python dictionaries in most variables and (say) llget_ptr() defined as
> dict.__getitem__(), but that's not the main goal here.)
> 
> 
> Let's come back to PyPy.  We can now replace each high-level operation with a
> call to a support function like the above one.  Initially these support
> functions could contain only the llcall() to the CPython API; then we can
> progressively add more cases to cover all the RPython behavior.  The goal
> would be that eventually the support functions and the optimizer become good
> enough to remove all calls to the CPython API.  The optimizer can do that
> using global analysis: e.g. in the above rpython_add(), if both 'v' and 'w'
> come from a previous llstruct() where 'ob_type' was set to 'int', then we know
> that the first branch is taken.  This is a kind of generalized type inference
> using global constant propagation; it would superscede half of the annotation
> pass' job.
> 
> Also, the incremental nature of this approach is nice (and very Psyco-like,
> btw).  It is good for testing along the way, and also means that when we reach
> the goal of supporting enough for RPython, the result is still nicely useful
> for non-RPython code; in this case, when types cannot be inferred, there are
> remaining calls to the CPython API.  I think that this would be a widely
> useful tool!
> 
> I also like a lot the idea of a low-level language and optimizer which is
> entierely independent from Python and PyPy.  We could even (still) devise an
> assembly-like syntax and a parser for it, and make it a separate release.  

it seems a goal similar to part/some aspect of the LLVM project iself.

> 
> 
> Such an optimizer is obviously not easy to write 

yes

> but I think it's quite worth
> the effort.

I'm still thinking (no answer yet) whether the major problem you called 
structure-inlining can be more easely resolved in some RPython specific 
way, or it is best to attack it in this general way.
The relevant question seems to be: to which program mem locations 
((other) struct fields, function locals, inter function parameters) a 
from-this-particular-struct-field value propagates.

>  Most of it would be independent on the target language:

yes, depending a bit whether it will introduce/need function pointers, 
which are not natural for some possible target languages.

> it would
> e.g. replace operations with still more precise operations describing the
> usage of the structs, like
> 
>    llheapalloc('C struct name')
>    llheapget_int(x, 'field', 'C struct name')
> 
> Also, we should look closely for related work.  It looks like it *has to* have
> been done somewhere before...  But the kind of global optimizations that we
> need might be different from the ones any reasonable language would need,
> because no one would reasonably write code like the rpython_add() above when
> he actually means to do a simple integer addition :-)  What's essential here,
> after constant propagation, is to do liveness and alias analysis for the heap
> structures, so that they can be not allocated in the heap at all as often as
> possible.  Also, if we want to do explicit memory management (i.e. we would
> have to use a 'llfree' operation) with reference counters, then the above
> example of rpython_add() becomes even longer -- but of course it should still
> be optimized away.
> 

this LLVM paper (I just skimmed it)

http://llvm.cs.uiuc.edu/pubs/2003-04-29-DataStructureAnalysisTR.html

may contain some relevant info, at the very least the bibliography.