From tjreedy at udel.edu Fri Sep 24 21:06:19 2004 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Sep 2004 15:06:19 -0400 Subject: [pypy-dev] List dormant? Message-ID: Accessing the PyPy list via gmane, I have not seen anything for at least a couple of weeks. Is this a reflection of true dormancy? (If so, I will presume it is just temporary.) Terry J. Reedy From hpk at trillke.net Fri Sep 24 21:25:17 2004 From: hpk at trillke.net (holger krekel) Date: Fri, 24 Sep 2004 21:25:17 +0200 Subject: [pypy-dev] List dormant? In-Reply-To: References: Message-ID: <20040924192517.GI19356@solar.trillke.net> Hi Terry, [Terry Reedy Fri, Sep 24, 2004 at 03:06:19PM -0400] > Accessing the PyPy list via gmane, I have not seen anything for at least a > couple of weeks. > Is this a reflection of true dormancy? > (If so, I will presume it is just temporary.) Indeed, the last pypy-dev mails are from end of August as far as i see. There have been some commits and off-list communication going on, though. Also, we had some discussions on our pypy-funding list regarding EU funding. It seems possible now that PyPy will get funded starting from 1st of November which would clearly allow many of us to work more on pypy. However, Armin has been making quite some progress on a direct C-backend (not based on our Pyrex-approach) lately which reflects on the pypy-svn list. I am sure he is happy to answer questions or talk about the state of how this is going if someone asks him. Hum, hey Armin, what is the current state of the C backend? :-) Moreover, it's not unlikely that the next PyPy sprint will take place in Vilnius, Lithunia, with the help of the POV ("Programmers of Vilnius") people. The projected date is around 15th of November till 21st of November 2004. Btw, we are always very interested in places and possibilities to do coding sprints (not neccessarily in Europe!). Most of the current code base has been developed in sprints so far and it's fun events where one learns a lot. When all goes well i hope that we will have monthly update reports and I guess the key point will be to really put out releases for people to play with. cheers, holger From lac at strakt.com Sat Sep 25 08:47:52 2004 From: lac at strakt.com (Laura Creighton) Date: Sat, 25 Sep 2004 08:47:52 +0200 Subject: [pypy-dev] List dormant? In-Reply-To: Message from "Terry Reedy" of "Fri, 24 Sep 2004 15:06:19 EDT." References: Message-ID: <200409250647.i8P6lqEr028976@ratthing-b246.strakt.com> In a message of Fri, 24 Sep 2004 15:06:19 EDT, "Terry Reedy" writes: >Accessing the PyPy list via gmane, I have not seen anything for at least >a >couple of weeks. > Is this a reflection of true dormancy? > (If so, I will presume it is just temporary.) > >Terry J. Reedy > The action has been in pypy-funding, pypy-sprint and in pypy-checkins. Our Dublin Sprint fell through. But we are organsing another one. Want to come to Vilnius? Laura From arigo at tunes.org Sat Sep 25 12:57:53 2004 From: arigo at tunes.org (Armin Rigo) Date: Sat, 25 Sep 2004 11:57:53 +0100 Subject: [pypy-dev] List dormant? In-Reply-To: <20040924192517.GI19356@solar.trillke.net> References: <20040924192517.GI19356@solar.trillke.net> Message-ID: <20040925105753.GA30020@vicky.ecs.soton.ac.uk> Hi Terry, hi Holger, hi everyone else, On Fri, Sep 24, 2004 at 09:25:17PM +0200, holger krekel wrote: > he is happy to answer questions or talk about the state > of how this is going if someone asks him. Hum, hey > Armin, what is the current state of the C backend? :-) Hum, it is a piece of experimental code that has grown quite large. Here is a large e-mail to explain it and why I'd rather like it to be smaller. It produces a C extension module. The functions' body are, literally, basic blocks in the C sense, with labels, and jumps to each other, following directly the control flow graph's structure. Additionally, the end of each function contains code that decrefs the variables that need to be decrefed in case of error; each operation that can fail will, in case of failure, jump to some label there. The kind of objects supported is enumerated in genc_repr.py: ints, generic PyObjects, tuples, classes, instances, lists, function pointers, method pointers. It all works reasonably well. For example, instances can have both instance and (read-only) class attributes. Methods are just a particular case of class attributes, as in regular Python. (Class attributes are difficult to do in Pyrex; it was one of the motivations to switch to C.) But at the same time I am not fully satisfied with the C backend. I have both small and big concerns. The biggest concern is about its overall structure. It is really like a classical compiler, taking graph blocks, analysing the operations inside, using the annotations computed previously as a guide; then it produces lower-level operations with explicit conversions, and finally a separate pass turns these into real C code. Here is where all this occurs: * typer.py turns the block's SpaceOperations into low-level ops * genc_typeset.py defines which low-level ops exist * genc_op.py classes to write the C code for each low-level op * genc_repr.py maps block's Variables to C types * genc.py calls all other modules; writes the C module * classtyper.py maps user-defined classes to structs Additionally, genc.h is inserted verbatim at the beginning of the C module; it contains macros to do the common operations. These macro definitions are also parsed (!) by genc_typeset.py, so that typer.py can know about their existence. In some sense it is fragile: it depends on the annotations being generated exactly as expected. The (earlier) annotation phase analyses the SpaceOperations and somehow promizes that there is a way to do the given operation and guarantee that it produces some results; e.g. an 'add' when applied to two SomeInteger()s gives another SomeInteger(). But then genc_typeset.py or genc.h must make sure that it actually implements an operation with that signature. There is some duplication there. For example the backend currently expects that a list object is never converted, i.e. if an object is created as a "list of ints" then it will remain a "list of ints" for its whole life. The annotations currently produced have this property, but it's kind of accidental. For the C backend itself it is an implicit external assumption. We could devise by hand annotations that crash the C backend although they are a priori reasonable. Well, another concern is that typer.py is quite confusing and was difficult to get right. I'm not too sure that the code that generates the Py_DECREF() after an error will decref all the correct variables in all cases. Something I don't even dare speak about too much is the way a Variable in the flow graphs maps to potentially a list of unrelated C variables (or fields in a struct). For example, a Variable annotated as a SomeTuple() of two integers will become two distinct C variables. And Variables that are sufficiently constant become zero C variables! This is a nice idea but it makes typer.py, genc_typeset.py and genc_op.py all the more obscure. Of course, I'm always thinking about reorganizing it all... The latest such idea was inspired by Seo who, for the Lisp backend, put in transform.py some code that actually modifies the control flow graph before it is passed to the backend. He replaces the 'newlist' and 'mul' operations corresponding to an expression '[a] * b' with a single custom operation, 'alloc_and_set'. After a lot of unsuccessful efforts in implementing 'list += list' in the C backend I remembered this idea and now transform.py turns a list-based 'inplace_add' into a whole new bunch of control flow blocks: # a = inplace_add(b, c) # becomes the following graph: # # clen = len(c) # growlist(b, clen) # ensure there is enough space for clen new items # | # | (pass all variables to next block, plus i=0) # V # ,--> z = lt(i, clen) # | exitswitch(z): # | | | False # | | True `------------------> ...sequel... # | V # | x = getitem(c, i) # | fastappend(b, x) # | i1 = add(i, 1) # | | # `-----' (pass all variables, with i=i1) So now the backend only has to worry about the two simple operations 'growlist' and 'fastappend'. The latter is an append where we can assume that there is already enough preallocated space. The graph transformation code itself is quite verbose, but with more developed utility routines it could be made simpler. The point is that it looks like a good idea to perform as many optimizations and transformations on the flow graph itself before passing it to the backend. (With hindsight it's obvious.) So I'm now thinking about how more of the typer.py mess could be moved there. One extreme idea would be to say that the flow graph should be transformed much more, step by step, until it eventually contains only operations that have an obvious direct C equivalent. This would make the C backend much simpler again (and also make simple non-C backends fun and quick to implement). This would include typing the operations: the 'add' would be reserved for 'add two PyObject*'; another 'add_i' would add two integers. Conversion operations would be inserted in the flow graph as needed. So typing would be more tightly coupled with the annotation phase, which I think is a good idea. Essentially, the same code that says that adding two SomeIntegers() produces a SomeInteger() would say that to do so the correct operation is 'add_i'. And the code that says that by default 'add' produces a SomeObject() would say that this requires the two inputs to be converted to PyObject*. That's all for now... A bientot, Armin From lac at strakt.com Sat Sep 25 18:09:48 2004 From: lac at strakt.com (Laura Creighton) Date: Sat, 25 Sep 2004 18:09:48 +0200 Subject: [pypy-dev] after discussion with Armin about RPython Message-ID: <200409251609.i8PG9m2b030018@ratthing-b246.strakt.com> which cleared up some misconceptions I had, I got to wonder. We started out deciding to define RPython. That bogged down. So we decided to make it be 'the minimal set of Python that we need for things to work' defined as 'what we have when we are done writing the translation layer'. This _is_ the pragmatic approach. But I now wonder if we might benefit from trying to define it more formally again. It might give us (me at any rate) a better idea of exactly what obvious direct C equivalents we need. But perhaps the rest of you can already see this without any formal definition .... Just a thought, Laura From arigo at tunes.org Sat Sep 25 20:48:19 2004 From: arigo at tunes.org (Armin Rigo) Date: Sat, 25 Sep 2004 19:48:19 +0100 Subject: [pypy-dev] after discussion with Armin about RPython In-Reply-To: <200409251609.i8PG9m2b030018@ratthing-b246.strakt.com> References: <200409251609.i8PG9m2b030018@ratthing-b246.strakt.com> Message-ID: <20040925184819.GA29258@vicky.ecs.soton.ac.uk> Hi Laura, On Sat, Sep 25, 2004 at 06:09:48PM +0200, Laura Creighton wrote: > trying to define it more formally again. It might give us (me at any rate) > a better idea of exactly what obvious direct C equivalents we need. The guidelines in svn/pypy/trunk/doc/objspace/restrictedpy.txt are still almost up-to-date. (I just mentioned dictionaries, which we decided to allow with string keys during the last sprint.) We might bit the bullet and write down a more formally complete and less hand-wavy definition, though. Armin From tjreedy at udel.edu Sun Sep 26 01:51:10 2004 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 25 Sep 2004 19:51:10 -0400 Subject: [pypy-dev] Re: List dormant? References: <200409250647.i8P6lqEr028976@ratthing-b246.strakt.com> Message-ID: "Laura Creighton" wrote in message > Want to come to Vilnius? Someday, say within the next 5 years. If you ever want to do a sprint in Delaware USA, let me know. Terry From lac at strakt.com Sun Sep 26 07:59:20 2004 From: lac at strakt.com (Laura Creighton) Date: Sun, 26 Sep 2004 07:59:20 +0200 Subject: [pypy-dev] http://projects.edgewall.com/qunittest/ Message-ID: <200409260559.i8Q5xKGZ031793@ratthing-b246.strakt.com> I wonder how hard that would be to integrate with our unittest framework? Laura From lac at strakt.com Sun Sep 26 08:55:58 2004 From: lac at strakt.com (Laura Creighton) Date: Sun, 26 Sep 2004 08:55:58 +0200 Subject: [pypy-dev] Re: List dormant? In-Reply-To: Message from "Terry Reedy" of "Sat, 25 Sep 2004 19:51:10 EDT." References: <200409250647.i8P6lqEr028976@ratthing-b246.strakt.com> Message-ID: <200409260655.i8Q6twLh031985@ratthing-b246.strakt.com> In a message of Sat, 25 Sep 2004 19:51:10 EDT, "Terry Reedy" writes: > >"Laura Creighton" wrote in message >> Want to come to Vilnius? > >Someday, say within the next 5 years. >If you ever want to do a sprint in Delaware USA, let me know. > >Terry That sounds like fun. How much lead-time for set-up do you need? Laura From ianb at colorstudy.com Sun Sep 26 09:42:43 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 26 Sep 2004 02:42:43 -0500 Subject: [pypy-dev] utest, development and discussion Message-ID: <415672F3.9060808@colorstudy.com> I'm interested in using utest for a project of mine, where the tests have gotten a bit out of control -- utest won't control them, but as long as I'm revisiting everything, I figured I might move to a test system I liked. Also, I want to make adding tests more accessible for other contributors. First question: I'm not being dumb if I convert all my tests to utest, am I? Not that utest is really a framework like unittest... but if I spend lots of time fiddling with test code, it would be a shame if utest went into disrepair, or was rewritten in a radically different way. Maybe that's not too big an issue, because utest doesn't really have an API, but since utest isn't used much outside of pypy (that I know of) I worry. Anyway, I've started working with it some. I've added one small feature (dropping into pdb when an exception occurs), and there's sure to be some more, particularly documentation. Where should I send patches? Where should discussion occur? And maybe a website? Thanks. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From hpk at trillke.net Sun Sep 26 09:50:05 2004 From: hpk at trillke.net (holger krekel) Date: Sun, 26 Sep 2004 09:50:05 +0200 Subject: [pypy-dev] http://projects.edgewall.com/qunittest/ In-Reply-To: <200409260559.i8Q5xKGZ031793@ratthing-b246.strakt.com> References: <200409260559.i8Q5xKGZ031793@ratthing-b246.strakt.com> Message-ID: <20040926075005.GY19356@solar.trillke.net> Hi Laura, [Laura Creighton Sun, Sep 26, 2004 at 07:59:20AM +0200] > I wonder how hard that would be to integrate with our unittest framework? Look into the source and tell us :-) Judging from looking at 'trac' which is also hosted at edgewell i guess it shouldn't be hard. Btw, the 'std' stuff [*] i showed at EuroPython which contains the unittest framework is soon to be renamed/refactored to the root name 'py' and the testing part will be 'py.test'. When i'll get to the point of integrating all this into pypy then i'll ask you for help with the nice renaming tool you have written ... And while we are at it, does anyone have real experience with "bicycle repair man"? I would like to try it for a renaming/refactoring session and am interested in any experiences. A refactoring tool supporting refactoring/renaming would probably be of help to PyPy. cheers, holger [*] http://codespeak.net/svn/user/hpk/talks/std-talk.txt From lac at strakt.com Sun Sep 26 10:18:01 2004 From: lac at strakt.com (Laura Creighton) Date: Sun, 26 Sep 2004 10:18:01 +0200 Subject: [pypy-dev] bug in http://codespeak.net/moin/pypy/moin.cgi/FrontPage?action=show Message-ID: <200409260818.i8Q8I1mD032184@ratthing-b246.strakt.com> If you click on the 'documentation' link you get a traceback! ooops. I will look at this later unless somebody beats me to it. Laura From lac at strakt.com Sun Sep 26 10:21:52 2004 From: lac at strakt.com (Laura Creighton) Date: Sun, 26 Sep 2004 10:21:52 +0200 Subject: [pypy-dev] utest, development and discussion In-Reply-To: Message from Ian Bicking of "Sun, 26 Sep 2004 02:42:43 CDT." <415672F3.9060808@colorstudy.com> References: <415672F3.9060808@colorstudy.com> Message-ID: <200409260821.i8Q8LqOU032224@ratthing-b246.strakt.com> In a message of Sun, 26 Sep 2004 02:42:43 CDT, Ian Bicking writes: >I'm interested in using utest for a project of mine, where the tests >have gotten a bit out of control -- utest won't control them, but as >long as I'm revisiting everything, I figured I might move to a test >system I liked. Also, I want to make adding tests more accessible for >other contributors. > >First question: I'm not being dumb if I convert all my tests to utest, >am I? Not that utest is really a framework like unittest... but if I >spend lots of time fiddling with test code, it would be a shame if utest >went into disrepair, or was rewritten in a radically different way. >Maybe that's not too big an issue, because utest doesn't really have an >API, but since utest isn't used much outside of pypy (that I know of) I >worry. The biggest way to get rid of that worry is to have more people like you using it. But as far as I know you will be the first person outside of pypy to do so. I think this would be _great_. Holger is the one who is actually working on the utest code. I don't think he has any radical changes planned, but I will let him speak for himself. If you are converting things wholesale, you might be interested in src/pypy/tool/utestconvert.py -- from the pypy svn repository, a script I wrote that does this automatically. It has only been run on pypy, as far as I know. Warning! The script goes off and converts 'assert raises' to 'raises' as it lives in the respository. I don't think that 'raises' is currently ready for production -- Holger? -- so you will want to comment out that line of translations. Also, this tool makes no attempt to understand when it is in a comment, so if you have a comment such as: """ blah blah and to test do: self.assertEquals(X, Y) blah blah blah """ Then that assert will be happily converted. The following one: """ blah blah and to test you cannot do: self.assertEquals(X, Y) blah blah overflow exception blah blah unexpected result blah blah blah """ will _not_ get converted, because python's expr will not be able to find something parseable after self.assertEquals, and in that case it just writes out exactly what it saw. The upshot is that there are _two_ unittest functions for utestconvert.py pypy/trunk/src/pypy/tool/test/test_utestconvert.py and pypy/trunk/src/pypy/tool/test/test_utestconvert2.py utestconvert2.py uses the standard python unittest framework. utestconvert.py is the same set of tests, written utest style, but I had to do that by hand because this is the cannonical file that cannot be converted by the tool itself -- it cheerfully changes 'what you want to change' into 'what you want it changed into'. >Anyway, I've started working with it some. I've added one small feature >(dropping into pdb when an exception occurs), and there's sure to be >some more, particularly documentation. Where should I send patches? >Where should discussion occur? And maybe a website? Discussion belongs here. Patches too unless you want to get a project login, a process that Holger handles. There isn't a separate part of the pypy wiki - or the website for utest. Probably that should change. And documentation belongs here: http://codespeak.net/pypy/index.cgi?doc which we generate out of files in pypy/trunk/doc . You write your docs in ReST and a daemon comes along and makes html out of them for you. >Thanks. > >-- >Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org Oh no, thank _you_. Laura From hpk at trillke.net Sun Sep 26 10:23:34 2004 From: hpk at trillke.net (holger krekel) Date: Sun, 26 Sep 2004 10:23:34 +0200 Subject: [pypy-dev] utest, development and discussion In-Reply-To: <415672F3.9060808@colorstudy.com> References: <415672F3.9060808@colorstudy.com> Message-ID: <20040926082334.GZ19356@solar.trillke.net> Hello Ian, [Ian Bicking Sun, Sep 26, 2004 at 02:42:43AM -0500] > I'm interested in using utest for a project of mine, where the tests > have gotten a bit out of control -- utest won't control them, but as > long as I'm revisiting everything, I figured I might move to a test > system I liked. Also, I want to make adding tests more accessible for > other contributors. hey, nice! > First question: I'm not being dumb if I convert all my tests to utest, > am I? Not that utest is really a framework like unittest... but if I > spend lots of time fiddling with test code, it would be a shame if utest > went into disrepair, or was rewritten in a radically different way. > Maybe that's not too big an issue, because utest doesn't really have an > API, but since utest isn't used much outside of pypy (that I know of) I > worry. It isn't even used much inside pypy right now but in some of my and maybe Armin's own projects mostly. Actually Laura has written a tool for conversion from unittest.py style tests and once the naming/API is getting towards finalization it is going to be applied for PyPy. So (i just wrote in that other posting) the biggest change is some pending renaming. Other than that you shouldn't expect any big changes to utest. The configuration file (currently "utest.conf") will very likely be reworked, though ... > Anyway, I've started working with it some. I've added one small feature > (dropping into pdb when an exception occurs), and there's sure to be > some more, particularly documentation. Where should I send patches? > Where should discussion occur? And maybe a website? I have just created the 'py-dev at codespeak.net' mailing list and will soon outline there the planned release and any changes along with some policies for code contributions. I would be happy if you (and others who are interested) join and offer your patches and opinions. http://codespeak.net/mailman/listinfo/py-dev Moreover, you can get write-access to the codespeak repository including the 'py' and 'pypy' part. Just drop me a note with username and ssh-public key. Experienced and known programmers in the community will usually just get such access if they want. And everyone hosting their projects there should know about this and accept this policy. Later in October we hopefully have the new hardware and software codespeak setup ready which should include 'trac' which is then to be used for "the py lib"'s webpages. Sorry that there isn't anything there, yet. cheers, holger From hpk at trillke.net Sun Sep 26 10:26:10 2004 From: hpk at trillke.net (holger krekel) Date: Sun, 26 Sep 2004 10:26:10 +0200 Subject: [pypy-dev] bug in http://codespeak.net/moin/pypy/moin.cgi/FrontPage?action=show In-Reply-To: <200409260818.i8Q8I1mD032184@ratthing-b246.strakt.com> References: <200409260818.i8Q8I1mD032184@ratthing-b246.strakt.com> Message-ID: <20040926082610.GA19356@solar.trillke.net> [Laura Creighton Sun, Sep 26, 2004 at 10:18:01AM +0200] > If you click on the 'documentation' link you get a traceback! ooops. I will > look at this later unless somebody beats me to it. fixed, but please don't spam the list with this but send a mail to pypywww at codespeak.net or to me personally. thanks, holger From lac at strakt.com Sun Sep 26 10:28:22 2004 From: lac at strakt.com (Laura Creighton) Date: Sun, 26 Sep 2004 10:28:22 +0200 Subject: [pypy-dev] http://projects.edgewall.com/qunittest/ In-Reply-To: Message from hpk@trillke.net (holger krekel) of "Sun, 26 Sep 2004 09:50:05 +0200." <20040926075005.GY19356@solar.trillke.net> References: <200409260559.i8Q5xKGZ031793@ratthing-b246.strakt.com> <20040926075005.GY19356@solar.trillke.net> Message-ID: <200409260828.i8Q8SMg6032258@ratthing-b246.strakt.com> In a message of Sun, 26 Sep 2004 09:50:05 +0200, holger krekel writes: >Hi Laura, > >[Laura Creighton Sun, Sep 26, 2004 at 07:59:20AM +0200] >> I wonder how hard that would be to integrate with our unittest framewor >k? > >Look into the source and tell us :-) Ok, will do. >Judging from looking at 'trac' which is also hosted at edgewell i guess >it shouldn't be hard. > >Btw, the 'std' stuff [*] i showed at EuroPython which contains >the unittest framework is soon to be renamed/refactored to the >root name 'py' and the testing part will be 'py.test'. When >i'll get to the point of integrating all this into pypy then >i'll ask you for help with the nice renaming tool you have >written ... Ooops, I just said the wrong thing to Ian Bicking, then ... What is the time frame for that? next week? Maybe we should hack on it when I am in Berlin? >And while we are at it, does anyone have real experience with >"bicycle repair man"? I would like to try it for a >renaming/refactoring session and am interested in any >experiences. A refactoring tool supporting refactoring/renaming >would probably be of help to PyPy. Shae Errison, whom I have cc'd this reply to knows about this and has real experience. I got it up and running, played with it for 2 hours, thought 'this was neat' and then never did anything more .... Laura > >cheers, > > holger > > >[*] http://codespeak.net/svn/user/hpk/talks/std-talk.txt >_______________________________________________ >pypy-dev at codespeak.net >http://codespeak.net/mailman/listinfo/pypy-dev From arigo at tunes.org Mon Sep 27 18:10:24 2004 From: arigo at tunes.org (Armin Rigo) Date: Mon, 27 Sep 2004 17:10:24 +0100 Subject: [pypy-dev] Rethinking genc with graph rewriting Message-ID: <20040927161024.GA3371@vicky.ecs.soton.ac.uk> Hi, I thought a bit more about how to generate the low-level translation of RPython code. It might indeed be possible to do it by rewriting the flow graph until it contains mostly lower-level operations, and then translating it to C code straightforwardly, if we insert an intermediate optimization phase between these two phases. Here are some more details. I'm not sure in which order to present the ideas. I apologize for the lengthy e-mail, it's still all a bit too early to write down as a formal ReST documentation... ===Motivation=== See for example how W_ListObject is implemented in the stdobjspace (which is RPython, too). It has got a size in 'ob_size' and list of items in 'ob_item'. The naive implementation of a list in RPython is with a PyListObject*, i.e. as a pointer to a structure containing (1) the length and (2) another pointer to the actual items. If we do it this way, then W_ListObject will end up being implemented badly: struct W_BadListObject { ...header and refcount... int ob_size; // for the ob_size field PySomeKindOfListObject* ob_item; // for the ob_item field }; struct PySomeKindOfListObject { int len; PyObject** items; }; So it takes two indirections from a W_ListObject to its array of items, although it only takes one in CPython's own lists. Why? Because the 'ob_item' list inside W_ListObjects could, maybe, escape the W_ListObject instances and outlive them, or be modified from somewhere else. If you think about it this way then the extra indirection is needed. But by looking closely at the source of listobject.py (or its flow graph version), an automatic analysis can figure out that this particular 'ob_item' field never escapes the control of W_ListObject. Thus we don't need the first indirection in this case. The PySomeKindOfListObject can be *inlined* into the structure implementing W_ListObject: struct W_ListObject { ...type and refcount headers... int ob_size; PySomeKindOfListObject ob_item; // not a pointer any more! }; Now, it looks a lot like CPython's PyListObject structure, with 'ob_item.len' playing the role of the 'allocated' field of CPython. I believe that this kind of "structure-inlining" optimization is important. It is something that cannot easily be done in the current genc_* files. That's why I started thinking along the lines I will described below in more details. I like this idea because it also works with simpler types like integers, not just lists: we don't have to do all the type-juggling in genc_* any more; instead, we consider integers as PyIntObject*, and by the same inlining mecanisms replace them with PyIntObject only -- even better, PyIntObject without the type and refcount headers. What remains? The ob_ival field only. In other words by inlining a PyIntObject* declaration we get a structure with only one C "long". (From there it is easy to actually get rid of the struct and replace it with its single field.) ===In more details=== Let's say we start with a flow graph. For this example, let's consider the flow graph generated for the function 'def f(x): return x+1'. block1(x): add(x, 1) -> y goto return_block(y) The annotation code as it is today will infer that y is a SomeInteger() if we tell it that x is a SomeInteger(). Moreover the '1' is also a SomeInteger() with the attribute 'const' set to 1. What I propose is that we add a set of rewrite rules, which could be freely applied. The rules say essentially that whenever we see a given operation with the given annotations, it can be rewritten into one (or possibly several) other operations. For example, a long rule could be: method_extend(lst1: SomeList, lst2: SomeList) -> result: SomeList ----------------------------------------------------------------- len(lst2) -> c growlist(lst1, c) goto loop[i=0] loop: lt(i, c) -> cond switch cond: case False: goto sequel case True: goto body body: getitem(lst2, i) -> x fastappend(lst1, x) add(i, 1) -> i1 goto loop[i=i1] sequel: Maybe rules should be put in a file not in Python syntax, and parsed by the rule-applying code. I'm not sure if the above pseudo-syntax would do, but maybe something better along these lines would work. Anyway, there would also be simpler rules like: add(int1: SomeInteger, int2: SomeInteger) -> result: SomeInteger ---------------------------------------------------------------- add_i(int1.ob_ival, int2.ob_ival) -> result.ob_ival Note the reference to the field ob_ival of PyIntObjects. The above line says that the operation 'add' (which is PyNumber_Add()), although fine for any kind of objects, could be optimized if we knew that all three involved objects follow the "structure" of SomeInteger. In this case we can perform it using the new operation 'add_i' on the ob_ival fields of the structures. Once more the intended meaning of the rule is: the operation above the line is perfectly valid on its own -- it would produce a call to PyNumber_Add() in C -- but for efficiency it can be replaced with the operation below the line. The idea is thus to start from the basic C code generator that just writes PyNumber_Add() and similar calls for any operation, and gradually move to "inlined" operation like add_i which correspond just to "+" in C. If we only do that, the C code would still manipulate heap-built objects only, i.e. PyIntObjects; we would just have replaced a call to PyNumber_Add() with inlined code like: result = ...create the PyIntObject...; ((PyIntObject*) result)->ob_ival = ((PyIntObject*) int1)->ob_ival + ((PyIntObject*) int2)->ob_ival; The idea is then to use structure inlining to detect and get rid of the PyIntObject (and other heap structures). This requires some kind of whole-program analysis. This analysis be done as an intermediate phase, between the rule-rewriting and the C generation phase. It would look at how each field of each object is used, and based on this deduce how each structure is best implemented. What I'm thinking about is something like this: * look if an object's structure is mutable or not, i.e. if we ever write to its fields. (This should distinguish initialization from later rewrites.) * look if an object is shared or not: if references to an object don't escape too far into various parts of the program then we don't need to refcount it. More precisely, we might be able to assign to the object a "parent" object, a container which is guaranteed to exist at least as long as its child. Thus the "parent" has got the only reference (no refcount needed) and others can borrow it. The parent can be the frame of a function, too, for objects that never outlive the function in which they were created. The point is that both non-shared and shared but immutable objects can be inlined into their parent, so that we can get rid of the heap allocation and the access indirection. For example, as PyIntObjects are all immutable, they can all be inlined: any "PyObject*" field or variable known to point to a PyIntObject can be replaced with a headerless PyIntObject structure in-place, or even directly with its single "long" field. For the above example we'd get code like this: result.ob_ival = int1.ob_ival + int2.ob_ival; Or after inlining the single field: result__ob_ival = int1__ob_ival + int2__ob_ival; Apart from the long variable names (ugh! __ob_ival after each RPython variable containing an integer :-), this is good ! Sorry for the long messages, I hope I could get some of my motivations through. Armin From arigo at tunes.org Mon Sep 27 19:06:39 2004 From: arigo at tunes.org (Armin Rigo) Date: Mon, 27 Sep 2004 18:06:39 +0100 Subject: [pypy-dev] Rethinking genc with graph rewriting In-Reply-To: <20040927161024.GA3371@vicky.ecs.soton.ac.uk> References: <20040927161024.GA3371@vicky.ecs.soton.ac.uk> Message-ID: <20040927170639.GA12735@vicky.ecs.soton.ac.uk> PS. I realize my previous e-mail carried a high risk of confusion. In there, W_ListObject was used as an example of RPython code; it was just an example, unrelated to the fact that the example discusses how to implement lists like the W_ListObject.ob_item field. I could have chosen W_TupleObject or W_DictObject as the example, and it would still have been about how to implement their fields that are RPython lists (e.g. W_TupleObject.wrappeditems or W_DictObject.data). In the particular example of W_ListObject, we would like (it is our goal) that the W_ListObject class be translated into a C structure with three fields that look like "size", "allocated" and "items_ptr". We want this because it is the best C-ish version of the W_ListObject class, among less efficient variants incurring more indirections; CPython uses this most efficient variant too in its C structure called PyListObject. This is not to be confused with lists as understood by RPython. The field W_ListObject.ob_item is such a list. They are marked with a SomeList() annotation. The RPython-to-C translator needs to know precisely what such lists are, and how to turn them into C code. We decided earlier that such lists would be translated as some sort of simple C array with no over-allocation, so that they look like a pointer to a structure with a "length" and an "items_ptr" field. This, the translator knows. It is so with any usage of lists in RPython code, not just in W_ListObject. So the goal in the example was how to have the translator automatically turn a class like W_ListObject into a C structure that would have fields that look like "size", "allocated" and "items_ptr", and doing so using the knowledge that W_ListObject has two fields, ob_size and ob_item, with annotations SomeInteger() and SomeList() respectively. A bient?t, Armin. From arigo at tunes.org Mon Sep 27 21:25:41 2004 From: arigo at tunes.org (Armin Rigo) Date: Mon, 27 Sep 2004 20:25:41 +0100 Subject: [pypy-dev] Python 2.4a3 Message-ID: <20040927192541.GA26837@vicky.ecs.soton.ac.uk> Hello, Just out of interest (I haven't investigated yet), py.py loads in Python 2.4a3 but 'import dis' apparently sends it into an infinite loop. It also prints 'faking ' just after 'import dis', which it doesn't do with Python 2.3.3. Funny. Armin From arigo at tunes.org Wed Sep 29 22:13:35 2004 From: arigo at tunes.org (Armin Rigo) Date: Wed, 29 Sep 2004 21:13:35 +0100 Subject: [pypy-dev] Python 2.4a3 In-Reply-To: <20040927192541.GA26837@vicky.ecs.soton.ac.uk> References: <20040927192541.GA26837@vicky.ecs.soton.ac.uk> Message-ID: <20040929201335.GA29008@vicky.ecs.soton.ac.uk> Hi, On Mon, Sep 27, 2004 at 08:25:41PM +0100, Armin Rigo wrote: > Just out of interest (I haven't investigated yet), py.py loads in Python 2.4a3 > but 'import dis' apparently sends it into an infinite loop. It also prints > 'faking ' just after 'import dis', which it doesn't do with > Python 2.3.3. This is due to opcode.py, which in 2.4 uses string formatting to build opcode names, while in 2.3 it uses concatenation. From the diff: < for op in range(256): opname[op] = '<' + `op` + '>' --- > for op in range(256): opname[op] = '<%r>' % (op,) As it happens, string formatting is *really* *slow* in PyPy now. Every one takes about 1 second! So importing opcode.py takes several minutes. Quoting Michael, "time to make string formatting faster". Armin From hpk at trillke.net Wed Sep 29 22:21:25 2004 From: hpk at trillke.net (holger krekel) Date: Wed, 29 Sep 2004 22:21:25 +0200 Subject: [pypy-dev] Python 2.4a3 In-Reply-To: <20040929201335.GA29008@vicky.ecs.soton.ac.uk> References: <20040927192541.GA26837@vicky.ecs.soton.ac.uk> <20040929201335.GA29008@vicky.ecs.soton.ac.uk> Message-ID: <20040929202125.GF19356@solar.trillke.net> [Armin Rigo Wed, Sep 29, 2004 at 09:13:35PM +0100] > < for op in range(256): opname[op] = '<' + `op` + '>' > --- > > for op in range(256): opname[op] = '<%r>' % (op,) is there a deeper reason for this change, btw? > As it happens, string formatting is *really* *slow* in PyPy now. Every one > takes about 1 second! So importing opcode.py takes several minutes. > > Quoting Michael, "time to make string formatting faster". which probably means to reimplement it at interpreter level ... which might be cumbersome but then it might be straight forward ... holger From ianb at colorstudy.com Thu Sep 30 02:46:45 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 29 Sep 2004 19:46:45 -0500 Subject: [pypy-dev] utest, development and discussion In-Reply-To: <200409260821.i8Q8LqOU032224@ratthing-b246.strakt.com> References: <415672F3.9060808@colorstudy.com> <200409260821.i8Q8LqOU032224@ratthing-b246.strakt.com> Message-ID: <415B5775.3040402@colorstudy.com> Laura Creighton wrote: >>Anyway, I've started working with it some. I've added one small feature >>(dropping into pdb when an exception occurs), and there's sure to be >>some more, particularly documentation. Where should I send patches? >>Where should discussion occur? And maybe a website? > > > Discussion belongs here. Patches too unless you want to get a project login, > a process that Holger handles. There isn't a separate part of the pypy wiki - > or the website for utest. Probably that should change. And documentation > belongs here: http://codespeak.net/pypy/index.cgi?doc which we generate > out of files in pypy/trunk/doc . You write your docs in ReST and a daemon > comes along and makes html out of them for you. Since std (I guess to be named py) isn't under pypy, I assume documentation should go in std/trunk/doc? Can it be set up that this is also turned into HTML? -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From mwh at python.net Thu Sep 30 13:55:07 2004 From: mwh at python.net (Michael Hudson) Date: Thu, 30 Sep 2004 12:55:07 +0100 Subject: [pypy-dev] Re: Python 2.4a3 References: <20040927192541.GA26837@vicky.ecs.soton.ac.uk> <20040929201335.GA29008@vicky.ecs.soton.ac.uk> <20040929202125.GF19356@solar.trillke.net> Message-ID: <2mbrformxw.fsf@starship.python.net> hpk at trillke.net (holger krekel) writes: > [Armin Rigo Wed, Sep 29, 2004 at 09:13:35PM +0100] >> < for op in range(256): opname[op] = '<' + `op` + '>' >> --- >> > for op in range(256): opname[op] = '<%r>' % (op,) > > is there a deeper reason for this change, btw? > >> As it happens, string formatting is *really* *slow* in PyPy now. Every one >> takes about 1 second! So importing opcode.py takes several minutes. >> >> Quoting Michael, "time to make string formatting faster". > > which probably means to reimplement it at interpreter level ... > which might be cumbersome but then it might be straight forward ... Or work out why it's so painfully slow currently, via hotshot or whatever. One could create a half-assed interpreter level implementation by just executing the current code at interpreter level (and probably changing a few little things, like calling space.str instead of str). It almost certainly wouldn't be RPython though (esp. the floating point stuff which uses long arithmetic; the rest might be close). Not a lot of fun, though. Cheers, mwh -- Important data should not be entrusted to Pinstripe, as it may eat it and make loud belching noises. -- from the announcement of the beta of "Pinstripe" aka. Redhat 7.0