From kevinar18 at hotmail.com  Sun Aug  1 04:09:28 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Sat, 31 Jul 2010 22:09:28 -0400
Subject: [pypy-dev] FW: Would the following shared memory model be
 possible?
In-Reply-To: <AANLkTinkmnvuAcQeENEu9TTdv-kM0zNbzJqPP50kqUnC@mail.gmail.com>
References: <SNT110-W47285BE7903AF23E1C71C6AAA70@phx.gbl>,
	<20100727062702.GE12699@tunixman.com>,
	<SNT110-W23BAAD6B986917DCB074DFAAA70@phx.gbl>,
	<AANLkTikFJV6X=oHkcqiGgLY2=y78RyaJ9tdtfWVDNGT5@mail.gmail.com>,
	<SNT110-W620A3C87857DA349470E4AAAA90@phx.gbl>,
	<AANLkTi=qybt7TBLApQAxv1=kkeTHavrVE1mdLwy7MkVr@mail.gmail.com>,
	<SNT110-W2199628E56FB6042005B67AAA90@phx.gbl>,
	<SNT110-W32AAE681A5D96EB09D74C0AAA90@phx.gbl>,
	<AANLkTi=pgH+pCq1+DqfWM7HrJKAtJPhUuymfvKofsvqa@mail.gmail.com>,
	<SNT110-W160371F023012B78F79B4BAAAB0@phx.gbl>,
	<AANLkTinkmnvuAcQeENEu9TTdv-kM0zNbzJqPP50kqUnC@mail.gmail.com>
Message-ID: <SNT110-W367ACF03019DE231E39015AAAC0@phx.gbl>


> > I have no idea what I did you warrant you hateful replies towards me, but
> > they really are not appropriate (in public or private email).
>
> I had absolutely no intention of offending you, and am deeply sorry
> for any offense that I may have caused you.

I must admit, I'm rather surprised by your reply -- and also thank you.  I'm sorry for the trouble I caused you with this.  I had hoped for a good conversation about the issues related to Kamaelia, yet everytime I got a reply back, it seemed like you were mad at me for some unknown reason.
 
As a simple example of what I mean.  In your first email, you mentioned a lot of different programming styles related to FBP and Kamaelia.  Since I am interested in parallel "research" I put those words into google and made a whole bookmark section so that I would have them for future study.  When I replied back, I figured that this would be a good way to lighten the mood in the email, so I thanked you for the info and asked for any more links/ideas you might want to mention.  A shared point of interest might be a good way to foster a nice friendly atmosphere.  Unfortunately, I am assuming you must have misunderstood me, because instead of stirring up a friendly interest, I received several paragraphs about me being inconsiderate (not searching google for something) and putting an undue burden on you.
 
At this point, it would be really unfair to talk about it further.  I guess to sum things up, I got the impression that you were mad at me for some unknown reason: it was like each successive email was going further and further down hill -- and I didn't know why.
However, in the end, I am glad that the whole situation could be resolved the way it has been.
 
 
> I'll bow out at this point.
I wouldn't want you to have to do that; your input can be very useful to people.
I apologized, you apolgized....  Some stuff was cleared up, etc....  I don't think anybody here is holding a grude or going rehash the topic again (me and you included).
 
You have very specific knowledge related to Kamaelia that could be useful to people exploring micro-threading implementations, parallel computing, etc....
 
---------------
Now, to change the topic slightly (and hopeful in a positive way).
 
 
I'm not sure if it really matters to you, but I have been considering another possible way to make a parallel tasklet (like for FBP and Kamaelia) in PyPy... but I don't have 3+ months to spend ironing out the flaws, learning PyPy, writing an implemenation, etc....   ... and to be honest, I would not feel comfortable asking someone else (here or otherwise) to try and make something for my benefit.
 
On another note... something that might actually interest you:  I have done some work on a graphical front-end for FBP ... nothing super special, mind you, but I could keep you informed in the future if is something of interest to you.
 
 
Anyways, hope this email turns out on a positive note for you and everyone else.
Kevin
 
 
> > I have no idea what I did you warrant you hateful replies towards me, but
> > they really are not appropriate (in public or private email).
>
> I had absolutely no intention of offending you, and am deeply sorry
> for any offense that I may have caused you.
>
> In my reply I merely wanted to flag that I don't have time to go into
> everything (like most people), that asking questions in a public realm
> is better because you may then get answers from multiple people, and
> that people who appear to do some research first tend to get better
> answers. I also tried to give an example, but that doesn't appear to
> have been helpful. (I'm fallible like everyone else)
>
> My intention there was to be helpful and to explain why I have that
> view of only replying on list, and it appears to have offended you
> instead, and I apologise. (one person's direct and helpful speech in
> one place can be a mortal insult somewhere else)
>
> After those couple of paragraphs, I tried to add to your discussion by
> replying to your specific points which you asked about parallel
> execution, noting places and examples where it is possible today. (to
> varying degrees of satisfaction) I then also tried to answer your
> point of "if something extra could be done, what would probably be
> generally useful". To that I noted that *my* talk there was cheap, and
> that execution was hard.
>
> Somehow along the way, my intent to try to be helpful to you has
> resulted in offending and upsetting you, and for that I am truly sorry
> - life is simply too short for people to upset each other, and in no
> way was my post intended as "hateful", and once again, my apologies.
> In future please assume good intentions - I assumed good intentions on
> your part.
>
> I'll bow out at this point.
>
> Best Regards,
>
>
> Michael.
>
> >
> >> Date: Sat, 31 Jul 2010 02:08:49 +0100
> >> Subject: Re: [pypy-dev] FW: Would the following shared memory model be
> >> possible?
> >> From: sparks.m at gmail.com
> >> To: kevinar18 at hotmail.com
> >> CC: pypy-dev at codespeak.net
> >>
> >> On Thu, Jul 29, 2010 at 6:44 PM, Kevin Ar18 wrote:
> >> > You brought up a lot of topics. I went ahead and sent you a private
> >> > email.
> >> > There's always lots of interesting things I can add to my list of things
> >> > to
> >> > learn about. :)
> >>
> >> Yes, there are lots of interesting things. I have a limited amount of
> >> time however (I should be in bed, it's very late here, but I do /try/
> >> to reply to on-list mails), so cannot spood feed you. Mailing me
> >> directly rather than a (relevant) list precludes you getting answers
> >> from someone other than me. Not being on lists also precludes you
> >> getting answers to questions by chance. Changing emails and names in
> >> email headers also makes keeping track of people hard...
> >>
> >> (For example you asked off list last year about Kamaelia's license
> >> from a different email address. Since it wasn't searchable I
> >> completely forgot. You also asked all sorts of questions but didn't
> >> want the answers public, so I didn't reply. If instead you'd
> >> subscribed to the list, and asked there, you'd've found out that
> >> Kamaelia's license changed - to the Apache Software License v2 ...)
> >>
> >> If I mention something you find interesting, please Google first and
> >> then ask publicly somewhere relevant. (the answer and question are
> >> then googleable, and you're doing the community a service IMO if you
> >> ask q's that way - if you're question is somewhere relevant and shows
> >> you've already googled prior work as far as you can... People are
> >> time however (I should be in bed, it's very late here, but I do /try/
> >> to reply to on-list mails), so cannot spood feed you. Mailing me
> >> directly rather than a (relevant) list precludes you getting answers
> >> from someone other than me. Not being on lists also precludes you
> >> getting answers to questions by chance. Changing emails and names in
> >> email headers also makes keeping track of people hard...
> >>
> >> (For example you asked off list last year about Kamaelia's license
> >> from a different email address. Since it wasn't searchable I
> >> completely forgot. You also asked all sorts of questions but didn't
> >> want the answers public, so I didn't reply. If instead you'd
> >> subscribed to the list, and asked there, you'd've found out that
> >> Kamaelia's license changed - to the Apache Software License v2 ...)
> >>
> >> always willing to help people who show willing to help themselves in
> >> my experience.)
> >>
> >> >> just looks to me that you're tieing yourself up in knots over things
> >> >> that aren't problems, when there are some things which could be useful
> >> >> (in practice) & interesting in this space.
> >> > The particular issue in this situation is that there is no way to make
> >> > Kamaelia, FBP, or other concurrency concepts run in parallel (unless you
> >> > are
> >> > willing to accept lots of overhead like with the multiprocessing
> >> > queues).
> >> >
> >> > Since you have worked with Kamaelia code a lot... you understand a lot
> >> > more
> >> > about implementation details. Do you think the previous shared memory
> >> > concept or something like it would let you make Kamaelia parallel?
> >> > If not, can you think of any method that would let you make Kamaelia
> >> > parallel?
> >>
> >> Kamaelia already CAN run components in parallel in different processes
> >> (has been able to do so for quite some time) or on different
> >> processors. Indeed, all you do is use a ProcessPipeline or
> >> ProcessGraphline rather than Pipeline or Graphline, and the components
> >> in the top level are spread across processes. I still view the code as
> >> experimental, but it does work, and when needed is very useful.
> >>
> >> Kamaelia running on Iron Python can run on seperate processors sharing
> >> data efficiently (due to lack of GIL there) happily too. Threaded
> >> components there do that naturally - I don't use IronPython, but it
> >> does run on Iron Python. On windows this is easiest, though Mono works
> >> just as well.
> >>
> >> I believe Jython also is GIL free, and Kamaelia's Axon runs there
> >> cleanly too. As a result because Kamaelia is pure python, it runs
> >> truly in parallel there too (based on hearing from people using
> >> kamaelia on jython). Cpython is the exception (and a rather big one at
> >> that). (Pypy has a choice IIUC)
> >>
> >> Personally, I think if PyPy worked with generators better (which is
> >> why I keep an eye on PyPy) and cpyext was improved, it'd provide a
> >> really compelling platform for me. (I was rather gutted at Europython
> >> to hear that PyPy's generator support was still ... problematic)
> >>
> >> Regarding the *efficiency* and *enforcement* of the approach taken, I
> >> feel you're chasing the wrong tree, but let's go there.
> >>
> >> What approach does baseline (non-Iron Python running) kamaelia take
> >> for multi-process work?
> >>
> >> For historical reasons, it builds on top of pprocess rather than
> >> multiprocessing module based. This means for interprocess
> >> communications objects are pickled before being sent over operating
> >> system pipes.
> >>
> >> This provides an obvious communications overhead - and this isn't
> >> really kamaelia specific at this point.
> >>
> >> However, shifting data from one CPU to another is expensive, and only
> >> worth doing in some circumstances. (Consider a machine with several
> >> physical CPUs - each has a local CPU cache, and the data needs to be
> >> transferred from one to another, which is why partly people worry
> >> about thread/CPU affinity etc)
> >>
> >> Basically, if you can manage it, you don't want to shift data between
> >> CPUs, you want to partition the processing.
> >>
> >> ie you may want to start caring about the size of messages and number
> >> of messages going between processes. Sending small and few between
> >> processes is going to be preferable to sending large and many for
> >> throughput purposes.
> >>
> >> In the case of small and few, the approach of pickling and sending
> >> across OS pipes isn't such a bad idea. It works.
> >>
> >> If you do want to share data between CPUs, and it sounds like you do,
> >> then most OSs already provide a means of doing that - threads. The
> >> conventions people use for using threads are where they become
> >> unpicked, but as a mechanism, threads do generally work, and work
> >> well.
> >>
> >> As well as channels/boxes, you can use an STM approach, such as than
> >> in Axon.STM ...
> >> * http://www.kamaelia.org/STM.html
> >> *
> >> http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Bindings/STM/
> >>
> >> ...which is logically very similar to version control for variables. A
> >> downside of STM (at least with this approach) however, is that for it
> >> to work, you need either copy on write semantics for objects, or full
> >> copying of objects or similar. Personally I use a biological metaphor
> >> here, in that channels/boxes and components, and similar perform a
> >> similar function to axons and neurons in the body, and that STM is
> >> akin to the hormonal system for maintaining and controlling system
> >> state. (I modelled biological tree growth many moons ago)
> >>
> >> Anyhow, coming back to threads, that brings us back to python, and
> >> implementations with a GIL, and those without.
> >>
> >> For implementations with a GIL, you then have a choice: do I choose to
> >> try and implement a memory model that _enforces_ data locality? that
> >> is if a piece of data is in use inside a single "process" or "thread"
> >> (from hereon I'll use "task" as a generic phrase) that trying to use
> >> it inside another causes a problem for the task attempting to breach
> >> the model.
> >>
> >> In order to enforce this, I personally believe you'd need to use
> >> multiple processes, and only share data through dedicated code
> >> managing shared memory. You could of course do this outside user code.
> >> To do this you'd need an abstraction that made sense, and something
> >> like stackless' channels or kamaelia's (in/out) box model makes sense
> >> there. (The CELL API uses a mailbox metaphor as well for reference)
> >>
> >> In that case, you have a choice. You either copy the data into shared
> >> memory, or you share the data in situ. The former gives you back
> >> precisely the same overhead previously described, or the latter
> >> fragments your memory (since you can no longer access it). You could
> >> also have compaction.
> >>
> >> However, personally, I think any possible benefits here are outweighed
> >> by the costs and complexity.
> >>
> >> The alternative is to _encourage_ data locality. That is encourage the
> >> usage and sharing of data such that whilst you could share data
> >> between tasks and cause corruption that the common way of using the
> >> system discourages such actions. In essence that's what I try to do in
> >> Kamaelia, and it seems to work. Specifically, the model says:
> >>
> >> * If I take a piece of data from an inbox, I own it and can do anything
> >> with it that I like. If you think of a physical piece of paper and
> >> I take it from an intray, then that really is the case.
> >>
> >> * If I put a piece of data in an outbox, I no longer own it and should
> >> not attempt to do anything more with it. Again, using a physical
> >> metaphor, and naming scheme helps here. In particular, if I put a
> >> piece of paper in the post, I can no longer modify it. How it gets
> >> to its recipient is not my concern either.
> >>
> >> In practice this does actually work. If you add in immutable tuples,
> >> and immutable strings then it becomes a lot clearer how this can work.
> >>
> >> Is there a risk here of accidental modification? Yes. However, the
> >> size and general simplicity of components tends to lead to such
> >> problems being picked up early. It also enables component level
> >> acceptance tests. (We tend to build small examples of usage, which in
> >> turn effectively form acceptance tests)
> >>
> >> [ An alternative is to make the "send" primitive make a copy on send.
> >> That would be quite an overhead, and also limit the types of data you
> >> can send. ]
> >>
> >> In practical terms, it works. (Stackless proves this as well IMO,
> >> since despite some differences, there's also lots of similarities)
> >>
> >> The other question that arises, is "isn't the GIL a problem with
> >> threads?". Well, the answer to that really depends on what you're
> >> doing. David Beazely's talk on what happens on mixing different sorts
> >> of threads shows that it isn't ideal, and if you're hitting that
> >> behaviour, then actually switching to real processes makes sense.
> >> However if you're doing CPU intensive work inside a C extension which
> >> releases the GIL (eg numpy), then it's less of an issue in practice.
> >> Custom extensions can do the same.
> >>
> >> So, for example, picking something which I know colleagues [1] at work
> >> do, you can use a DVS broadcast capture card to capture video frames,
> >> pass those between threads which are doing processing on them, and
> >> inside those threads use c extensions to process the data efficiently
> >> (since image processing does take time...), and those release the GIL
> >> boosting throughput.
> >>
> >> [1] On this project :
> >> http://www.bbc.co.uk/rd/projects/2009/10/i3dlive.shtml
> >>
> >> So, that makes it all sound great - ie things can, after various
> >> fashions, run in parallel on various versions of python, to practical
> >> benefit. But obviously it could be improved.
> >>
> >> Personally, I think the project most likely to make a difference here
> >> is actually pypy. Now, talk is very cheap, and easy, and I'm not
> >> likely to implement this, so I'll aim to be brief. Execution is hard.
> >>
> >> In particular, what I think is most likely to be beneficial is
> >> something _like_ this:
> >>
> >> Assume pypy runs without a GIL. Then allow the creation of a green
> >> process. A green process is implemented using threads, but with data
> >> created on the heap such that it defaults to being marked private to
> >> the thread (ie ala thread local storage, but perhaps implemented
> >> slightly differently - via references from the thread local storage
> >> into the heap) rather than shared. Sharing between green processes
> >> (for channels or boxes) would "simply" be detagged as being owned by
> >> one thread, and passed to another.
> >>
> >> In particular this would mean that you need a mechanism for doing
> >> this. Simply attempting to call another green process (or thread) from
> >> another with mutable data types would be sufficient to raise the
> >> equivalent of a segmentation fault.
> >>
> >> Secondly, improve cpyext to the extent that each cpython extension
> >> gets it's own version of the GIL. (ie each extension runs with its own
> >> logical runtime, and thinks that it has its own GIL which it can lock
> >> and release. In practice it's faked by the PyPy runtime. This is
> >> essentially similar conceptually to creating green processes.
> >>
> >> It's worth considering that the Linux kernel went through similar
> >> changes, in that in the 2.0 days there was a large single big lock,
> >> which was replaced by ever granular locks. I personally think that
> >> since there are so many extensions that rely on the existence of the
> >> GIL simply waving a wand to get rid of it isn't likely. However
> >> logically providing a GIL per C-Extension may be plausible, and _may_
> >> be sufficient.
> >>
> >> However, I don't know - it might well not - I've not looked at the
> >> code, and talk is cheap - execution is hard.
> >>
> >> Hopefully the above (cheap :) comments are in some small way useful.
> >>
> >> Regards,
> >>
> >>
> >> Michael.
> > 		 	   		  

From holger at merlinux.eu  Sun Aug  1 13:50:29 2010
From: holger at merlinux.eu (holger krekel)
Date: Sun, 1 Aug 2010 13:50:29 +0200
Subject: [pypy-dev] py.test/debian and pypy issue
Message-ID: <20100801115029.GL1914@trillke.net>

Hi all, 

just for you information: if you are running Debian (e.g. Ubuntu 10.04) 
and install "py.test" (codespeak-python-lib) from there you get the
9-month old py.test-1.1 which cannot run PyPy's trunk-test suite.  Solutions:

* uninstall the debian version.  install 'py' from PyPI with e.g. 
  "pip install py" or "easy_install py" - this should get you 
  the 1.3.3 version which should work fine. 

* uninstall the debian version, don't install any other and 
  then alias "py.test" to "trunk/pypy/py/bin/py.test" which 
  means you use the pypy-included py version, currently version 1.3.1
  which is also the version used in nightly test runs etc. 

sidenote: Fedora 13 ships 1.3.2 and Gentoo ships 1.3.3 so you
mostly only get the issues on debian-based systems, i guess. 

best,
holger


From glavoie at gmail.com  Sun Aug  1 22:04:29 2010
From: glavoie at gmail.com (Gabriel Lavoie)
Date: Sun, 1 Aug 2010 16:04:29 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: <SNT110-W6556C5CFD4F13E48D182A9AAA90@phx.gbl>
References: <SNT110-W47285BE7903AF23E1C71C6AAA70@phx.gbl>
	<AANLkTikzdrSxG00-GUghR6bBTKiSrMBq+LZcFr_B-9xO@mail.gmail.com>
	<SNT110-W6556C5CFD4F13E48D182A9AAA90@phx.gbl>
Message-ID: <AANLkTi=74uW3dhvrp2Ti2wA+x9ni7L9jZaa69EG4HDCO@mail.gmail.com>

Sorry for the late answer, I was unavailable in the last few days.

About send() and receive(), it depends on if the communication is local or
not. For a local communication, anything can be passed since only the
reference is sent. This is the base model for Stackless channels. For a
remote communication (between two interpreters), any picklable object (a
copy will then be made) and it includes channels and tasklets (for which a
reference will automatically be created).

The use of the PyPy proxy object space is to make remote communication more
Stackless like by passing object by reference. If a ref_object is made, only
a reference will be passed when a tasklet is moved or the object is sent on
a channel. The object always resides where it was created. A move()
operation will also be implemented on those objects so they can be moved
around like tasklets.

I hope it helps,

Gabriel

2010/7/29 Kevin Ar18 <kevinar18 at hotmail.com>

>
> > Hello Kevin,
> > I don't know if it can be a solution to your problem but for my
> > Master Thesis I'm working on making Stackless Python distributed. What
> > I did is working but not complete and I'm right now in the process of
> > writing the thesis (in french unfortunately). My code currently works
> > with PyPy's "stackless" module onlyis and use some PyPy specific
> > things. Here's what I added to Stackless:
> >
> > - Possibility to move tasklets easily (ref_tasklet.move(node_id)). A
> > node is an instance of an interpreter.
> > - Each tasklet has its global namespace (to avoid sharing of data). The
> > state is also easier to move to another interpreter this way.
> > - Distributed channels: All requests are known by all nodes using the
> > channel.
> > - Distributed objets: When a reference is sent to a remote node, the
> > object is not copied, a reference is created using PyPy's proxy object
> > space.
> > - Automated dependency recovery when an object or a tasklet is loaded
> > on another interpreter
> >
> > With a proper scheduler, many tasklets could be automatically spread in
> > multiple interpreters to use multiple cores or on multiple computers. A
> > bit like the N:M threading model where N lightweight threads/coroutines
> > can be executed on M threads.
>
> Was able to have a look at the API...
> If others don't mind my asking this on the mailing list:
>
> * .send() and .receive()
> What type of data can you send and receive between the tasklets?  Can you
> pass entire Python objects?
>
> * .send() and .receive() memory model
> When you send data between tasklets (pass messages) or whateve you want to
> call it, how is this implemented under the hood?  Does it use shared memory
> under the hood or does it involve a more costly copying of the data?  I
> realize that if it is on another machine you have to copy the data, but what
> about between two threads?  You mentioned PyPy's proxy object.... guess I'll
> need to read up on that.
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


-- 
Gabriel Lavoie
glavoie at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100801/18bdd567/attachment.html>

From alex.gaynor at gmail.com  Mon Aug  2 04:11:15 2010
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Sun, 1 Aug 2010 22:11:15 -0400
Subject: [pypy-dev] I broke stackless
Message-ID: <AANLkTi=16HYs2P-ACbBnmzB5VYNSafVzaV_P54vw29qN@mail.gmail.com>

The work I did changing CALL_METHOD to support keyword arguments moved
some rstack.resume_point calls around, and it seems to have
inadvertantly broken stackless on trunk.  The latest translation fail
can be found here:
http://buildbot.pypy.org/builders/pypy-c-stackless-app-level-linux-x86-32/builds/597/steps/translate/logs/stdio.
 Anyone have a suggestion as to what exactly I need to do to get this
working?

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me


From benjamin at python.org  Mon Aug  2 04:52:34 2010
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 1 Aug 2010 21:52:34 -0500
Subject: [pypy-dev] I broke stackless
In-Reply-To: <AANLkTi=16HYs2P-ACbBnmzB5VYNSafVzaV_P54vw29qN@mail.gmail.com>
References: <AANLkTi=16HYs2P-ACbBnmzB5VYNSafVzaV_P54vw29qN@mail.gmail.com>
Message-ID: <AANLkTi=CWLyKEG+do-3wS62dHY6bi=9Bj2+3F+26hP7y@mail.gmail.com>

2010/8/1 Alex Gaynor <alex.gaynor at gmail.com>:
> The work I did changing CALL_METHOD to support keyword arguments moved
> some rstack.resume_point calls around, and it seems to have
> inadvertantly broken stackless on trunk. ?The latest translation fail
> can be found here:
> http://buildbot.pypy.org/builders/pypy-c-stackless-app-level-linux-x86-32/builds/597/steps/translate/logs/stdio.
> ?Anyone have a suggestion as to what exactly I need to do to get this
> working?

Revert it! :)


-- 
Regards,
Benjamin


From todd.a.anderson at intel.com  Tue Aug  3 19:29:49 2010
From: todd.a.anderson at intel.com (Anderson, Todd A)
Date: Tue, 3 Aug 2010 10:29:49 -0700
Subject: [pypy-dev] Percentage Python as RPython.
Message-ID: <9662F248D13E8C45B097A77F005E9729B0C39B74@orsmsx503.amr.corp.intel.com>

Sorry if this has been asked before.  I did some searching of the archive and didn't see anything but I might have missed it.

I am curious what percentage of real-world Python programs in use are also RPython programs.  I know that the FAQ says that the translator is not intended for Python programs in general but only for the PyPy interpreter itself but I've also seen a few mentions (on other sites) of attempting to translate Python to C.  I've been thinking about adding a backend to the translator but would only want to do so if a significant amount of Python programs could use it.

thanks,

Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100803/8a101abb/attachment.html>

From fijall at gmail.com  Tue Aug  3 20:52:54 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 3 Aug 2010 20:52:54 +0200
Subject: [pypy-dev] Percentage Python as RPython.
In-Reply-To: <9662F248D13E8C45B097A77F005E9729B0C39B74@orsmsx503.amr.corp.intel.com>
References: <9662F248D13E8C45B097A77F005E9729B0C39B74@orsmsx503.amr.corp.intel.com>
Message-ID: <AANLkTikxJEKfNdyG6D+B4nVt=EZzCCe3XDSvKu8a18BH@mail.gmail.com>

On Tue, Aug 3, 2010 at 7:29 PM, Anderson, Todd A
<todd.a.anderson at intel.com> wrote:
> Sorry if this has been asked before. ?I did some searching of the archive
> and didn?t see anything but I might have missed it.
>
>
>
> I am curious what percentage of real-world Python programs in use are also
> RPython programs. ?I know that the FAQ says that the translator is not
> intended for Python programs in general but only for the PyPy interpreter
> itself but I?ve also seen a few mentions (on other sites) of attempting to
> translate Python to C.? I?ve been thinking about adding a backend to the
> translator but would only want to do so if a significant amount of Python
> programs could use it.
>

0 - 0.5% (generally, none. You write programs for RPython in a
different manner).

>
>
> thanks,
>
>
>
> Todd
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


From ademan555 at gmail.com  Tue Aug  3 22:08:07 2010
From: ademan555 at gmail.com (Dan Roberts)
Date: Tue, 3 Aug 2010 13:08:07 -0700
Subject: [pypy-dev] Percentage Python as RPython.
In-Reply-To: <9662F248D13E8C45B097A77F005E9729B0C39B74@orsmsx503.amr.corp.intel.com>
References: <9662F248D13E8C45B097A77F005E9729B0C39B74@orsmsx503.amr.corp.intel.com>
Message-ID: <AANLkTimiWqR+yT-Qapyf7gsQEv9B9BcSyiqVv6GVR4-c@mail.gmail.com>

Hi Todd,
I'm not sure what your goals are, but my position is that if you write a
translator backend and a JIT backend (please do) you can have fast (and
improving) python on platform X.  What were you hoping to target with your
backend?
Cheers,
Dan

On Aug 3, 2010 10:43 AM, "Anderson, Todd A" <todd.a.anderson at intel.com>
wrote:

 Sorry if this has been asked before.  I did some searching of the archive
and didn?t see anything but I might have missed it.


I am curious what percentage of real-world Python programs in use are also
RPython programs.  I know that the FAQ says that the translator is not
intended for Python programs in general but only for the PyPy interpreter
itself but I?ve also seen a few mentions (on other sites) of attempting to
translate Python to C.  I?ve been thinking about adding a backend to the
translator but would only want to do so if a significant amount of Python
programs could use it.


thanks,


Todd

_______________________________________________
pypy-dev at codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100803/a611f5d2/attachment.html>

From bhartsho at yahoo.com  Wed Aug  4 10:21:10 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Wed, 4 Aug 2010 01:21:10 -0700 (PDT)
Subject: [pypy-dev] demoting method, cannot follow, call result degenerated
Message-ID: <267005.44211.qm@web114012.mail.gq1.yahoo.com>

I'm still struggling to learn all the rules of RPython, i have read the coding guide, and the PDF's PyGirl and Ancona's RPython paper, but still i feel i'm not fully grasping everything.

I have a function that returns different classes that all share a common base class.  It works until i introduce a new subclass that has some methods of the same name.  Then i get the demotion, can not follow, degenerated error.

I googled, but all i can find is an IRC log where Fijal seems to taking talking about my problem.
http://www.tismer.com/pypy/irc-logs/pypy/%23pypy.log.20070125

<fijal> pedronis: if function can return (in rpython) set of classes with common superclass, than all methods that I call later must be defined on that superclass, right?

[11:30] <fijal> [15:01] <pedronis> yes, unless you assert a specific subclass 

So i just need to use an assert statement before the function return, and assert the class i am returning?

I am blogging about my progress while learning RPython, i have posted about meta-programming in Rpython which is a new concept to me.

http://pyppet.blogspot.com/2010/08/meta-programming-in-rpython.html

-brett


From fijall at gmail.com  Wed Aug  4 10:25:59 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 4 Aug 2010 10:25:59 +0200
Subject: [pypy-dev] demoting method, cannot follow,
	call result 	degenerated
In-Reply-To: <267005.44211.qm@web114012.mail.gq1.yahoo.com>
References: <267005.44211.qm@web114012.mail.gq1.yahoo.com>
Message-ID: <AANLkTinisA=VmQ5nMRqXTJX1NBV8aGpjA8=qyGsWpLMB@mail.gmail.com>

Hey.

If at any place in code you want to call methods on a thing that can't
be proven to be of a specific subclass, they have to be defined on a
superclass (even dummy versions).

If you are however sure that this object will be of a specific subclass, write:
assert isinstance(x, MySubclass)
x.specific_method

that's fine

On Wed, Aug 4, 2010 at 10:21 AM, Hart's Antler <bhartsho at yahoo.com> wrote:
> I'm still struggling to learn all the rules of RPython, i have read the coding guide, and the PDF's PyGirl and Ancona's RPython paper, but still i feel i'm not fully grasping everything.
>
> I have a function that returns different classes that all share a common base class. ?It works until i introduce a new subclass that has some methods of the same name. ?Then i get the demotion, can not follow, degenerated error.
>
> I googled, but all i can find is an IRC log where Fijal seems to taking talking about my problem.
> http://www.tismer.com/pypy/irc-logs/pypy/%23pypy.log.20070125
>
> <fijal> pedronis: if function can return (in rpython) set of classes with common superclass, than all methods that I call later must be defined on that superclass, right?
>
> [11:30] <fijal> [15:01] <pedronis> yes, unless you assert a specific subclass
>
> So i just need to use an assert statement before the function return, and assert the class i am returning?
>
> I am blogging about my progress while learning RPython, i have posted about meta-programming in Rpython which is a new concept to me.
>
> http://pyppet.blogspot.com/2010/08/meta-programming-in-rpython.html
>
> -brett
>
>
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


From bhartsho at yahoo.com  Thu Aug  5 03:04:00 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Wed, 4 Aug 2010 18:04:00 -0700 (PDT)
Subject: [pypy-dev] Percentage Python as RPython.
Message-ID: <805862.40025.qm@web114009.mail.gq1.yahoo.com>

Todd,
I think you want a frontend, not a backend.  The frontend would take in normal Python and convert it to RPython.  RPython seems to have infinite meta-programming possibilities, so its just a matter of how hard would it be to make the meta frontend.  Probably too hard since python is so dynamic, but maybe its possible with a new subset of Python halfway to RPython, developers would then only have to port to Not-So-Restricted-Python, and then the frontend does the final job of converting to RPython.
-brett


From bhartsho at yahoo.com  Thu Aug  5 03:14:10 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Wed, 4 Aug 2010 18:14:10 -0700 (PDT)
Subject: [pypy-dev] demoting method, cannot follow,
	call result  degenerated
In-Reply-To: <AANLkTinisA=VmQ5nMRqXTJX1NBV8aGpjA8=qyGsWpLMB@mail.gmail.com>
Message-ID: <836035.68548.qm@web114016.mail.gq1.yahoo.com>

Thanks for clarifying Fijal, putting dummy functions on the base class fixes the demotion errors.

But now i have a new problem, from the bookkeeper, unpackiterable.
pypy.annotation.bookkeeper.CallPatternTooComplex': '*' argument must be SomeTuple
	.. v2 = call_args(v0, ((0, (), True, False)), v1)
	.. '(rbpy:1)BPY_Object_MESH.GET_location'

I checked the object, instead of SomeTuple it is SomeObject.
I'm trying to understand what causes the CallPatternTooComplex error, i can not reproduce it with a simple model that is close to what my actual code is doing.

class T(object):
	def hi( self, *args ): pass
class TA( T ):
	def hi( self, a,b,c ): pass
class TB( T ):
	def hi( self, y ): pass

def pypy_entrypoint():
	t = T()
	ta = TA()
	tb = TB()
	ta.hi(1,2,'x')
	tb.hi()
	tb.hi('xxx')
	print 'too complex test'

the above translates just fine, no TooComplex error.
-brett


--- On Wed, 8/4/10, Maciej Fijalkowski <fijall at gmail.com> wrote:

> From: Maciej Fijalkowski <fijall at gmail.com>
> Subject: Re: [pypy-dev] demoting method, cannot follow, call result  degenerated
> To: "Hart's Antler" <bhartsho at yahoo.com>
> Cc: pypy-dev at codespeak.net
> Date: Wednesday, 4 August, 2010, 1:25 AM
> Hey.
> 
> If at any place in code you want to call methods on a thing
> that can't
> be proven to be of a specific subclass, they have to be
> defined on a
> superclass (even dummy versions).
> 
> If you are however sure that this object will be of a
> specific subclass, write:
> assert isinstance(x, MySubclass)
> x.specific_method
> 
> that's fine
> 
> On Wed, Aug 4, 2010 at 10:21 AM, Hart's Antler <bhartsho at yahoo.com>
> wrote:
> > I'm still struggling to learn all the rules of
> RPython, i have read the coding guide, and the PDF's PyGirl
> and Ancona's RPython paper, but still i feel i'm not fully
> grasping everything.
> >
> > I have a function that returns different classes that
> all share a common base class. ?It works until i introduce
> a new subclass that has some methods of the same name.
> ?Then i get the demotion, can not follow, degenerated
> error.
> >
> > I googled, but all i can find is an IRC log where
> Fijal seems to taking talking about my problem.
> > http://www.tismer.com/pypy/irc-logs/pypy/%23pypy.log.20070125
> >
> > <fijal> pedronis: if function can return (in
> rpython) set of classes with common superclass, than all
> methods that I call later must be defined on that
> superclass, right?
> >
> > [11:30] <fijal> [15:01] <pedronis> yes,
> unless you assert a specific subclass
> >
> > So i just need to use an assert statement before the
> function return, and assert the class i am returning?
> >
> > I am blogging about my progress while learning
> RPython, i have posted about meta-programming in Rpython
> which is a new concept to me.
> >
> > http://pyppet.blogspot.com/2010/08/meta-programming-in-rpython.html
> >
> > -brett
> >
> >
> >
> > _______________________________________________
> > pypy-dev at codespeak.net
> > http://codespeak.net/mailman/listinfo/pypy-dev
> >
> 


From kevinar18 at hotmail.com  Fri Aug  6 04:30:27 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 5 Aug 2010 22:30:27 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: <AANLkTi=74uW3dhvrp2Ti2wA+x9ni7L9jZaa69EG4HDCO@mail.gmail.com>
References: <SNT110-W47285BE7903AF23E1C71C6AAA70@phx.gbl>,
	<AANLkTikzdrSxG00-GUghR6bBTKiSrMBq+LZcFr_B-9xO@mail.gmail.com>,
	<SNT110-W6556C5CFD4F13E48D182A9AAA90@phx.gbl>,
	<AANLkTi=74uW3dhvrp2Ti2wA+x9ni7L9jZaa69EG4HDCO@mail.gmail.com>
Message-ID: <SNT110-W393FEE8C72F62C59AA86BAAA910@phx.gbl>


Note: Gabriel, do you think we should discuss this on another mailing list (or in private) as I'm not sure this related to PyPy dev anymore?
 
 
Anywyas, what are your future plans for the project?
Is it just an experiment for school ... maybe in the hopes that others would maintaining it if it was found to be interesting?
...
are you planning actual future development, maintenance, promotion of it yourself?


-----------

On a personal note... the concept has a lot of similarities to what I am exploring. However, I would have to make so many additional modifications. Perhaps you can give some thoughts on whether it would take me a long time to add such things?

Some examples:

* Two additional message passing styles (in addition to your own)
Queues - multiple tasklets can push onto queue, only one tasklet can pop.... multiple tasklets can access the property to find out if there is any data in the queue. Queues can be set to an infite size or set with a max # of entries allowed.

Streams - I'm not sure of the exact name, but kind of like an infinite stream/buffer ... useful for passing infinite amounts of data. Only one tasklet can write/add data. Only one tasklet can read/extract data.


* Message passing
When you create a tasklet, you assign a set number of queues or streams to it (it can have many) and whether they extract data from them or write to them (they can only either extract or write to it as noted above). The tasklet's global namespace has access to these queues or streams and can extract or add data to them.

In my case, I look at message passing from the perspective of the tasklet. A tasklet can either be assigned a certain number of "in ports" and a certain number of "out ports." In this case the "in ports" are the .read() end of a queue or stream and the "out ports" are the .send() part of a queue or stream.


* Scheduler
For the scheduler, I would need to control when a tasklet runs. Currently, I am thinking that I would look at all the "in ports" that a tasklet has and make sure each one has some data. Only then would the tasklet be scheduled to run by the scheduler.


------------
On another note, I am curious how you handled the issue of "nested" objects. Consider send() and receive() that you use to pass objects around in your project. Am I correct in that these objects cannot contain references outside of themselves? Also, how do you handle extracting out of the tree and making sure there are not references outside the object?

For example, consider the following object, where "->" means it has a reference to that object

Object 1 -> Object 2

Object 2 -> Object 3
Object 2 -> Object 4

Object 4 -> Object 2


Now, let's say I have a tasklet like the following:

.... -> incoming data = pointer/reference to Object 1

1. read incoming data (get Object 1 reference)
2. remove Object 3
3. send Object 3 to tasklet B
4. send Object 1 to tasklet C

Result:
tasklet B now has this object:
pointer/reference to Object 1, which contains the following tree:
Object 1 -> Object 2
Object 2 -> Object 4
Object 4 -> Object 2


tasklet C now has this object:
pointer/reference to Object 3, which contains the following tree:
Object 3


On the other hand, consider the following scenario:
 
1. read incoming data (get Object 1 reference)
2. remove Object 4
ERROR: this would not be possible, as it refers to Object 2
 

> Sorry for the late answer, I was unavailable in the last few days.
>
> About send() and receive(), it depends on if the communication is local
> or not. For a local communication, anything can be passed since only
> the reference is sent. This is the base model for Stackless channels.
> For a remote communication (between two interpreters), any picklable
> object (a copy will then be made) and it includes channels and tasklets
> (for which a reference will automatically be created).
>
> The use of the PyPy proxy object space is to make remote communication
> more Stackless like by passing object by reference. If a ref_object is
> made, only a reference will be passed when a tasklet is moved or the
> object is sent on a channel. The object always resides where it was
> created. A move() operation will also be implemented on those objects
> so they can be moved around like tasklets.
>
> I hope it helps,
>
> Gabriel
>
> 2010/7/29 Kevin Ar18>
>
>> Hello Kevin,
>> I don't know if it can be a solution to your problem but for my
>> Master Thesis I'm working on making Stackless Python distributed. What
>> I did is working but not complete and I'm right now in the process of
>> writing the thesis (in french unfortunately). My code currently works
>> with PyPy's "stackless" module onlyis and use some PyPy specific
>> things. Here's what I added to Stackless:
>>
>> - Possibility to move tasklets easily (ref_tasklet.move(node_id)). A
>> node is an instance of an interpreter.
>> - Each tasklet has its global namespace (to avoid sharing of data). The
>> state is also easier to move to another interpreter this way.
>> - Distributed channels: All requests are known by all nodes using the
>> channel.
>> - Distributed objets: When a reference is sent to a remote node, the
>> object is not copied, a reference is created using PyPy's proxy object
>> space.
>> - Automated dependency recovery when an object or a tasklet is loaded
>> on another interpreter
>>
>> With a proper scheduler, many tasklets could be automatically spread in
>> multiple interpreters to use multiple cores or on multiple computers. A
>> bit like the N:M threading model where N lightweight threads/coroutines
>> can be executed on M threads.
>
> Was able to have a look at the API...
> If others don't mind my asking this on the mailing list:
>
> * .send() and .receive()
> What type of data can you send and receive between the tasklets? Can
> you pass entire Python objects?
>
> * .send() and .receive() memory model
> When you send data between tasklets (pass messages) or whateve you want
> to call it, how is this implemented under the hood? Does it use shared
> memory under the hood or does it involve a more costly copying of the
> data? I realize that if it is on another machine you have to copy the
> data, but what about between two threads? You mentioned PyPy's proxy
> object.... guess I'll need to read up on that.
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
>
>
> --
> Gabriel Lavoie
> glavoie at gmail.com 		 	   		  

From glavoie at gmail.com  Fri Aug  6 05:31:15 2010
From: glavoie at gmail.com (Gabriel Lavoie)
Date: Thu, 5 Aug 2010 23:31:15 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: <SNT110-W393FEE8C72F62C59AA86BAAA910@phx.gbl>
References: <SNT110-W47285BE7903AF23E1C71C6AAA70@phx.gbl>
	<AANLkTikzdrSxG00-GUghR6bBTKiSrMBq+LZcFr_B-9xO@mail.gmail.com>
	<SNT110-W6556C5CFD4F13E48D182A9AAA90@phx.gbl>
	<AANLkTi=74uW3dhvrp2Ti2wA+x9ni7L9jZaa69EG4HDCO@mail.gmail.com>
	<SNT110-W393FEE8C72F62C59AA86BAAA910@phx.gbl>
Message-ID: <AANLkTi=x=L_tO5OowjZkfD1rNw1tY9Cr1NdK1kh1=xM1@mail.gmail.com>

I don't mind replying to the mailing list unless it annoys someone? Maybe
some people could be interested by this discussion.

You have a lot of questions! :) My answers are inline.

2010/8/5 Kevin Ar18 <kevinar18 at hotmail.com>

>
> Note: Gabriel, do you think we should discuss this on another mailing list
> (or in private) as I'm not sure this related to PyPy dev anymore?
>
>
> Anywyas, what are your future plans for the project?
> Is it just an experiment for school ... maybe in the hopes that others
> would maintaining it if it was found to be interesting?
> ...
> are you planning actual future development, maintenance, promotion of it
> yourself?
>

Based on the interest and time I'll and and other people will have I plan to
debug this as much as possible. If people are interested to join in after my
thesis, I'll be more than open to welcome then in the project. Right now,
I'm writing my report and I'm also looking for a job. I won't have much time
to touch again to the code before next month to prepare it for my
presentation, along with a lot of examples and use cases.


>
>
> -----------
>
> On a personal note... the concept has a lot of similarities to what I am
> exploring. However, I would have to make so many additional modifications.
> Perhaps you can give some thoughts on whether it would take me a long time
> to add such things?
>

Allright, my plan was to make all the needed lower level constructs that can
be used to build more complex things. For example, a mix of tasklet and sync
channels could be wrapped in an API to create async channels. I know this is
far from complete and I have a few ideas on how it could be improved in the
future but it's currently not needed for my project.

For now, the idea was to stay as close as possible to standard Stackless
Python and only add the needed APIs and functionalities to support
distributing tasklets between multiple interpreters.


>
> Some examples:
>
> * Two additional message passing styles (in addition to your own)
> Queues - multiple tasklets can push onto queue, only one tasklet can
> pop.... multiple tasklets can access the property to find out if there is
> any data in the queue. Queues can be set to an infite size or set with a max
> # of entries allowed.


This could easily be implemented using a standard channel and by starting
multiple tasklets to send data. With some helper methods on a channel it
could be possible to know how many tasklets are waiting to send their data.
A channel already have a built-in queue for send/receive requests. This
queue contains a list of all tasklets waiting for a send/receive operation.
Tasklets are supposed to be lightweight enough to support something like
this.


>

Streams - I'm not sure of the exact name, but kind of like an infinite
> stream/buffer ... useful for passing infinite amounts of data. Only one
> tasklet can write/add data. Only one tasklet can read/extract data.
>

Like a UNIX pipe()? Async? Again, some code wrapping standard channels could
be used for this.


>
>
> * Message passing
> When you create a tasklet, you assign a set number of queues or streams to
> it (it can have many) and whether they extract data from them or write to
> them (they can only either extract or write to it as noted above). The
> tasklet's global namespace has access to these queues or streams and can
> extract or add data to them.
>
> In my case, I look at message passing from the perspective of the tasklet.
> A tasklet can either be assigned a certain number of "in ports" and a
> certain number of "out ports." In this case the "in ports" are the .read()
> end of a queue or stream and the "out ports" are the .send() part of a queue
> or stream.
>
>
Sorry, I don't really understand what you're trying to explain here. Maybe
an example could be helpful? :)


>
> * Scheduler
> For the scheduler, I would need to control when a tasklet runs. Currently,
> I am thinking that I would look at all the "in ports" that a tasklet has and
> make sure each one has some data. Only then would the tasklet be scheduled
> to run by the scheduler.
>
>
Couldn't all those ports (channels) be read one at a time, then the
processing could be done? I don't exactly see the need to play with the
scheduler. Channels are blocking. A tasklet will be anyway unscheduled when
it tries to read on a channel in which no data is available.


>
>
> ------------
> On another note, I am curious how you handled the issue of "nested"
> objects. Consider send() and receive() that you use to pass objects around
> in your project. Am I correct in that these objects cannot contain
> references outside of themselves? Also, how do you handle extracting out of
> the tree and making sure there are not references outside the object?
>

Right now, I did not really dig too far with this problem. With a local
communication, a reference to the object is sent through a channel. The
receiver tasklet will have the same access to the object and all the
sub-object as the sender tasklet.

For remote communications, pickling is involved. The object to send must be
picklable. It excludes any I/O object unless the programmer creates its own
pickling protocol for those. A copy of all the object tree will then be
made. Sometime it's good (small objects), sometime it's bad (really complex,
big objects, I/O objects, etc.). This is why I added the concept of
ref_object() using PyPy's proxy object space. For such objects, a proxy can
be made and only a reference object will be sent to the remote side. This
object will have the same type as the original object but all operations
will be forwarded to the host node. All replies will also be wrapped by
proxies when sent back to the remote reference object. The only case where a
proxy object is not created is with atomic types (string, int, float, etc).
It's useless for those because they are immutable anyway. A remote access to
those would introduce useless latency. With ref_object(), the object tree
always stay on the initial node. A move() operation will also be added to
those ref_object()s to be able to move them between interpreters if needed.


>
> For example, consider the following object, where "->" means it has a
> reference to that object
>
> Object 1 -> Object 2
>
> Object 2 -> Object 3

Object 2 -> Object 4


> Object 4 -> Object 2
>
>
> Now, let's say I have a tasklet like the following:
>
> .... -> incoming data = pointer/reference to Object 1
>
> 1. read incoming data (get Object 1 reference)
> 2. remove Object 3
> 3. send Object 3 to tasklet B
> 4. send Object 1 to tasklet C
>
> Result:
> tasklet B now has this object:
> pointer/reference to Object 1, which contains the following tree:

Object 1 -> Object 2
> Object 2 -> Object 4
> Object 4 -> Object 2
>
>
> tasklet C now has this object:
> pointer/reference to Object 3, which contains the following tree:
> Object 3
>
>
I think you swapped tasklet B and tasklet C for the end result! ;)


>
>
> On the other hand, consider the following scenario:
>
> 1. read incoming data (get Object 1 reference)
> 2. remove Object 4
> ERROR: this would not be possible, as it refers to Object 2
>

Why isn't it possible?
By removing "Object 4" I guess you mean removing this link: Object 2 ->
Object 4? This is the only way Object 4 could be removed.


>
> > Sorry for the late answer, I was unavailable in the last few days.
> >
> > About send() and receive(), it depends on if the communication is local
> > or not. For a local communication, anything can be passed since only
> > the reference is sent. This is the base model for Stackless channels.
> > For a remote communication (between two interpreters), any picklable
> > object (a copy will then be made) and it includes channels and tasklets
> > (for which a reference will automatically be created).
> >
> > The use of the PyPy proxy object space is to make remote communication
> > more Stackless like by passing object by reference. If a ref_object is
> > made, only a reference will be passed when a tasklet is moved or the
> > object is sent on a channel. The object always resides where it was
> > created. A move() operation will also be implemented on those objects
> > so they can be moved around like tasklets.
> >
> > I hope it helps,
> >
> > Gabriel
> >
> > 2010/7/29 Kevin Ar18>
> >
> >> Hello Kevin,
> >> I don't know if it can be a solution to your problem but for my
> >> Master Thesis I'm working on making Stackless Python distributed. What
> >> I did is working but not complete and I'm right now in the process of
> >> writing the thesis (in french unfortunately). My code currently works
> >> with PyPy's "stackless" module onlyis and use some PyPy specific
> >> things. Here's what I added to Stackless:
> >>
> >> - Possibility to move tasklets easily (ref_tasklet.move(node_id)). A
> >> node is an instance of an interpreter.
> >> - Each tasklet has its global namespace (to avoid sharing of data). The
> >> state is also easier to move to another interpreter this way.
> >> - Distributed channels: All requests are known by all nodes using the
> >> channel.
> >> - Distributed objets: When a reference is sent to a remote node, the
> >> object is not copied, a reference is created using PyPy's proxy object
> >> space.
> >> - Automated dependency recovery when an object or a tasklet is loaded
> >> on another interpreter
> >>
> >> With a proper scheduler, many tasklets could be automatically spread in
> >> multiple interpreters to use multiple cores or on multiple computers. A
> >> bit like the N:M threading model where N lightweight threads/coroutines
> >> can be executed on M threads.
> >
> > Was able to have a look at the API...
> > If others don't mind my asking this on the mailing list:
> >
> > * .send() and .receive()
> > What type of data can you send and receive between the tasklets? Can
> > you pass entire Python objects?
> >
> > * .send() and .receive() memory model
> > When you send data between tasklets (pass messages) or whateve you want
> > to call it, how is this implemented under the hood? Does it use shared
> > memory under the hood or does it involve a more costly copying of the
> > data? I realize that if it is on another machine you have to copy the
> > data, but what about between two threads? You mentioned PyPy's proxy
> > object.... guess I'll need to read up on that.
> > _______________________________________________
> > pypy-dev at codespeak.net
> > http://codespeak.net/mailman/listinfo/pypy-dev
> >
> >
> >
> > --
> > Gabriel Lavoie
> > glavoie at gmail.com
>

By the way, if you come to #pypy on FreeNode, I'm WildChild! I'm always
there though not alway available. I'm in the EST timezone (UTC-5).

See ya,

Gabriel

-- 
Gabriel Lavoie
glavoie at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100805/5ac63a52/attachment.html>

From bhartsho at yahoo.com  Tue Aug 10 08:32:31 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Mon, 9 Aug 2010 23:32:31 -0700 (PDT)
Subject: [pypy-dev] rstruct where is pack?
Message-ID: <250182.36782.qm@web114016.mail.gq1.yahoo.com>

Seems like struct.pack is not RPython?  I see the examples for unpack in the tests folder, but not for packing.


From benjamin at python.org  Tue Aug 10 15:14:04 2010
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 10 Aug 2010 08:14:04 -0500
Subject: [pypy-dev] rstruct where is pack?
In-Reply-To: <250182.36782.qm@web114016.mail.gq1.yahoo.com>
References: <250182.36782.qm@web114016.mail.gq1.yahoo.com>
Message-ID: <AANLkTinbR6hidXK634=vsXPi7emhHJ3e9ZyF5ySDdLVw@mail.gmail.com>

2010/8/10 Hart's Antler <bhartsho at yahoo.com>:
> Seems like struct.pack is not RPython? ?I see the examples for unpack in the tests folder, but not for packing.

struct.pack() is implemented in pypy/module/rstruct/.


-- 
Regards,
Benjamin


From anto.cuni at gmail.com  Tue Aug 10 15:32:29 2010
From: anto.cuni at gmail.com (Antonio Cuni)
Date: Tue, 10 Aug 2010 15:32:29 +0200
Subject: [pypy-dev] rstruct where is pack?
In-Reply-To: <AANLkTinbR6hidXK634=vsXPi7emhHJ3e9ZyF5ySDdLVw@mail.gmail.com>
References: <250182.36782.qm@web114016.mail.gq1.yahoo.com>
	<AANLkTinbR6hidXK634=vsXPi7emhHJ3e9ZyF5ySDdLVw@mail.gmail.com>
Message-ID: <4C6154ED.1070904@gmail.com>

On 10/08/10 15:14, Benjamin Peterson wrote:
> 2010/8/10 Hart's Antler <bhartsho at yahoo.com>:
>> Seems like struct.pack is not RPython?  I see the examples for unpack in the tests folder, but not for packing.
> 
> struct.pack() is implemented in pypy/module/rstruct/.

I suppose you mean pypy/module/struct.

But if the OP is looking for an rpython lib to use in his rpython program,
this is not exactly what he looks for, although I agree it could be adapted
and ported to rlib.

ciao,
Anto


From bhartsho at yahoo.com  Wed Aug 11 03:08:51 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Tue, 10 Aug 2010 18:08:51 -0700 (PDT)
Subject: [pypy-dev] rstruct where is pack?
In-Reply-To: <4C6154ED.1070904@gmail.com>
Message-ID: <951074.37494.qm@web114002.mail.gq1.yahoo.com>

I have made a RPython replacement for struct pack/unpack that could go in rlib.  It is not a drop in replacement, and for some reason i can't get long to work, but for simple packing and unpacking it will work.  Posted the code on my blog if anybody ever runs into the same problem:
http://pyppet.blogspot.com/2010/08/rpython-struct.html


--- On Tue, 8/10/10, Antonio Cuni <anto.cuni at gmail.com> wrote:

> From: Antonio Cuni <anto.cuni at gmail.com>
> Subject: Re: [pypy-dev] rstruct where is pack?
> To: "Benjamin Peterson" <benjamin at python.org>
> Cc: "Hart's Antler" <bhartsho at yahoo.com>, pypy-dev at codespeak.net
> Date: Tuesday, 10 August, 2010, 6:32 AM
> On 10/08/10 15:14, Benjamin Peterson
> wrote:
> > 2010/8/10 Hart's Antler <bhartsho at yahoo.com>:
> >> Seems like struct.pack is not RPython?? I see
> the examples for unpack in the tests folder, but not for
> packing.
> > 
> > struct.pack() is implemented in pypy/module/rstruct/.
> 
> I suppose you mean pypy/module/struct.
> 
> But if the OP is looking for an rpython lib to use in his
> rpython program,
> this is not exactly what he looks for, although I agree it
> could be adapted
> and ported to rlib.
> 
> ciao,
> Anto
> 


From arigo at tunes.org  Wed Aug 11 14:20:31 2010
From: arigo at tunes.org (Armin Rigo)
Date: Wed, 11 Aug 2010 14:20:31 +0200
Subject: [pypy-dev] I broke stackless
In-Reply-To: <AANLkTi=CWLyKEG+do-3wS62dHY6bi=9Bj2+3F+26hP7y@mail.gmail.com>
References: <AANLkTi=16HYs2P-ACbBnmzB5VYNSafVzaV_P54vw29qN@mail.gmail.com>
	<AANLkTi=CWLyKEG+do-3wS62dHY6bi=9Bj2+3F+26hP7y@mail.gmail.com>
Message-ID: <20100811122031.GA2733@code0.codespeak.net>

Hi,

For reference, after IRC discussions I fixed it in r76475.


Armin


From arigo at tunes.org  Wed Aug 11 14:24:20 2010
From: arigo at tunes.org (Armin Rigo)
Date: Wed, 11 Aug 2010 14:24:20 +0200
Subject: [pypy-dev] Percentage Python as RPython.
In-Reply-To: <805862.40025.qm@web114009.mail.gq1.yahoo.com>
References: <805862.40025.qm@web114009.mail.gq1.yahoo.com>
Message-ID: <20100811122420.GB2733@code0.codespeak.net>

Hi Hart,

On Wed, Aug 04, 2010 at 06:04:00PM -0700, Hart's Antler wrote:
> I think you want a frontend, not a backend.  The frontend would take
> in normal Python and convert it to RPython.

I think the chances of getting this to work are "0 - 0.5 %", as per
fijal's previous excellent answer.

Writing in RPython requires a different state of mind than writing in
normal Python (unless, maybe, you are a Java programmer that writes Java
with the Python syntax; for that case, I would suggest that writing in
Java in the first place is just as easy).


A bientot,

Armin.


From stefan_ml at behnel.de  Thu Aug 12 08:49:09 2010
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 12 Aug 2010 08:49:09 +0200
Subject: [pypy-dev] What can Cython do for PyPy?
Message-ID: <i405h5$o06$1@dough.gmane.org>

Hi,

there has recently been a move towards a .NET/IronPython port of Cython, 
mostly driven by the need for a fast NumPy port. During the related 
discussion, the question came up how much it would take to let Cython also 
target other runtimes, including PyPy.

Given that PyPy already has a CPython C-API compatibility layer, I doubt 
that it would be hard to enable that. With my limited knowledge about the 
internals of that layer, I guess the question thus becomes: is there 
anything Cython could do to the C code it generates that would make the 
Cython generated extension modules run faster/better/safer on PyPy than 
they would currently? I never tried to make a Cython module actually run on 
PyPy (simply because I don't use PyPy), but I have my doubts that they'd 
run perfectly out of the box. While generally portable, I'm pretty sure the 
C code relies on some specific internals of CPython that PyPy can't easily 
(or efficiently) provide.

Stefan


From fijall at gmail.com  Thu Aug 12 10:05:01 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 12 Aug 2010 10:05:01 +0200
Subject: [pypy-dev] What can Cython do for PyPy?
In-Reply-To: <i405h5$o06$1@dough.gmane.org>
References: <i405h5$o06$1@dough.gmane.org>
Message-ID: <AANLkTi=3aNjJ+2vd2==64gqvygA45+0AKDWwt40NCBt4@mail.gmail.com>

Hi Stefan.

CPython extension compatibility layer is in alpha at best. I heavily
doubt that anything would run out of the box. However, this is a
cpython compatiblity layer anyway, it's not meant to be used as a long
term solutions. First of all it's inneficient (and unclear if will
ever be), but it's also unjitable. This means that to JIT, cpython
extension is like a black box which should not be touched.  Also,
several concepts, like refcounting are completely alien to pypy and
emulated.

For example for numpy, I think a rewrite is necessary to make it fast
(and as experiments have shown, it's possible to make it really fast),
so I would not worry about using cython for speeding things up. In
theory you should not need it and the boundary layer between
cython-compiled code and JITted code would make you suffer anyway.
There is another usecase for using cython for providing access to C
libraries. This is a bit harder question and I don't have a good
answer for that, but maybe cpython compatibility layer would be good
enough in this case? I can't see how Cython can produce a "native" C
code instead of CPython C code without some major effort.

Cheers,
fijal

On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> there has recently been a move towards a .NET/IronPython port of Cython,
> mostly driven by the need for a fast NumPy port. During the related
> discussion, the question came up how much it would take to let Cython also
> target other runtimes, including PyPy.
>
> Given that PyPy already has a CPython C-API compatibility layer, I doubt
> that it would be hard to enable that. With my limited knowledge about the
> internals of that layer, I guess the question thus becomes: is there
> anything Cython could do to the C code it generates that would make the
> Cython generated extension modules run faster/better/safer on PyPy than
> they would currently? I never tried to make a Cython module actually run on
> PyPy (simply because I don't use PyPy), but I have my doubts that they'd
> run perfectly out of the box. While generally portable, I'm pretty sure the
> C code relies on some specific internals of CPython that PyPy can't easily
> (or efficiently) provide.
>
> Stefan
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


From stefan_ml at behnel.de  Thu Aug 12 11:25:18 2010
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 12 Aug 2010 11:25:18 +0200
Subject: [pypy-dev] What can Cython do for PyPy?
In-Reply-To: <AANLkTi=3aNjJ+2vd2==64gqvygA45+0AKDWwt40NCBt4@mail.gmail.com>
References: <i405h5$o06$1@dough.gmane.org>
	<AANLkTi=3aNjJ+2vd2==64gqvygA45+0AKDWwt40NCBt4@mail.gmail.com>
Message-ID: <i40elu$o4h$1@dough.gmane.org>

Maciej Fijalkowski, 12.08.2010 10:05:
> On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:
>> there has recently been a move towards a .NET/IronPython port of Cython,
>> mostly driven by the need for a fast NumPy port. During the related
>> discussion, the question came up how much it would take to let Cython also
>> target other runtimes, including PyPy.
>>
>> Given that PyPy already has a CPython C-API compatibility layer, I doubt
>> that it would be hard to enable that. With my limited knowledge about the
>> internals of that layer, I guess the question thus becomes: is there
>> anything Cython could do to the C code it generates that would make the
>> Cython generated extension modules run faster/better/safer on PyPy than
>> they would currently? I never tried to make a Cython module actually run on
>> PyPy (simply because I don't use PyPy), but I have my doubts that they'd
>> run perfectly out of the box. While generally portable, I'm pretty sure the
>> C code relies on some specific internals of CPython that PyPy can't easily
>> (or efficiently) provide.
>
> CPython extension compatibility layer is in alpha at best. I heavily
> doubt that anything would run out of the box. However, this is a
> cpython compatiblity layer anyway, it's not meant to be used as a long
> term solutions. First of all it's inneficient (and unclear if will
> ever be)

If you only use it to call into non-trivial Cython code (e.g. some heavy 
calculations on NumPy tables), the call overhead should be mostly 
negligible, maybe even close to that in CPython. You could even provide 
some kind of fast-path to 'cpdef' functions (i.e. functions that are 
callable from both C and Python) and 'api' functions (which are currently 
exported at the module API level using the PyCapsule mechanism). That would 
reduce the call overhead to that of a C call.

Then, a lot of Cython code doesn't do much ref-counting and the like but 
simply runs in plain C. So, often enough, there won't be that much overhead 
involved in the code itself either, especially in tight loops where users 
prune away all CPython interaction anyway.


> but it's also unjitable. This means that to JIT, cpython
> extension is like a black box which should not be touched.

Well, unless both sides learn about each other, that is. It won't 
necessarily impact the JIT, but then again, a JIT usually won't have a 
noticeable impact on the performance of Cython code anyway.


> Also, several concepts, like refcounting are completely alien to pypy
> and emulated.

Sure. That's why I asked if there is anything that Cython can help to 
improve here. For example, the code it generates for INCREF/DECREF 
operations is not only configurable at the C preprocessor level.


> For example for numpy, I think a rewrite is necessary to make it fast
> (and as experiments have shown, it's possible to make it really fast),
> so I would not worry about using cython for speeding things up.

This isn't only about making things fast when being rewritten. This is also 
about accessing and reusing existing code in a new environment. Cython is 
becoming increasingly popular in the numerics community, and a lot of 
Cython code is being written as we speak, not only in the SciPy/NumPy 
environment. People even find it attractive enough to start rewriting their 
CPython extension modules (most often library wrappers) from C in Cython, 
both for performance and TCO reasons.


> There is another usecase for using cython for providing access to C
> libraries. This is a bit harder question and I don't have a good
> answer for that, but maybe cpython compatibility layer would be good
> enough in this case? I can't see how Cython can produce a "native" C
> code instead of CPython C code without some major effort.

Native (standalone) C code isn't the goal, just something that adapts well 
to what PyPy can provide as a CPython compatibility layer. If Cython 
modules work across independent Python implementations, that would be the 
most simple way by far to make lots of them available cross-platform, thus 
making it a lot simpler to switch between different implementations.

Stefan


From santagada at gmail.com  Thu Aug 12 16:31:01 2010
From: santagada at gmail.com (Leonardo Santagada)
Date: Thu, 12 Aug 2010 11:31:01 -0300
Subject: [pypy-dev] What can Cython do for PyPy?
In-Reply-To: <i405h5$o06$1@dough.gmane.org>
References: <i405h5$o06$1@dough.gmane.org>
Message-ID: <303CCB23-07C0-4D0F-90D2-0DD908DB4043@gmail.com>


On Aug 12, 2010, at 3:49 AM, Stefan Behnel wrote:

> Hi,
> 
> there has recently been a move towards a .NET/IronPython port of Cython, 
> mostly driven by the need for a fast NumPy port. During the related 
> discussion, the question came up how much it would take to let Cython also 
> target other runtimes, including PyPy.
> 
> Given that PyPy already has a CPython C-API compatibility layer, I doubt 
> that it would be hard to enable that. With my limited knowledge about the 
> internals of that layer, I guess the question thus becomes: is there 
> anything Cython could do to the C code it generates that would make the 
> Cython generated extension modules run faster/better/safer on PyPy than 
> they would currently? I never tried to make a Cython module actually run on 
> PyPy (simply because I don't use PyPy), but I have my doubts that they'd 
> run perfectly out of the box. While generally portable, I'm pretty sure the 
> C code relies on some specific internals of CPython that PyPy can't easily 
> (or efficiently) provide.


A possible solution I think would be to do an oo backend for cython. That could be made to generate C# or RPython code. The problem remains that pypy still doesn't have separate compilation so you cannot make a external module for the pypy interpreter after it is translated.

So it is hard, maybe harder than anyone on cython would like, but I still think it is a good solution. (Unless I'm mistaken in any of my assumptions, and then it is a terrible solution :)

--
Leonardo Santagada
santagada at gmail.com


From p.giarrusso at gmail.com  Thu Aug 12 17:35:40 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Thu, 12 Aug 2010 17:35:40 +0200
Subject: [pypy-dev] What can Cython do for PyPy?
In-Reply-To: <i40elu$o4h$1@dough.gmane.org>
References: <i405h5$o06$1@dough.gmane.org>
	<AANLkTi=3aNjJ+2vd2==64gqvygA45+0AKDWwt40NCBt4@mail.gmail.com>
	<i40elu$o4h$1@dough.gmane.org>
Message-ID: <AANLkTimseSf2NJMXcLBzpP6qo6ic48A8WMxnr4yrFyZR@mail.gmail.com>

I agree with the motivations given by Stefan - two interesting
possibilities would be to:
a) first, test the compatibility layer with Cython generated code

b) possibly, allow users to use the Python API while replacing
refcounting with another, more meaningful, PyPy-specific API* for a
garbage collected heap.

However, such an API is radically different. I'm also not sure how
well such an API would mesh with the CPython API, actually. If Cython
could support such an API, that would be great. But I'm unsure whether
this is worth it, for Cython, and more in general for other modules
(one could easily and elegantly support both CPython and PyPy with
preprocessor tricks).

See further below about why call overhead is not the biggest
performance problem when not inlining.

* I thought the Java Native Interface (JNI) design of local and global
references (http://download.oracle.com/javase/6/docs/technotes/guides/jni/spec/design.html#wp16785)
would work here, with some adaptation.
However, if your moving GCs support pinning of objects, as I expect to
be necessary to interact with CPython code, I would do an important
change to that API: instead of having object references be pointers to
(movable by the GC) pointers to objects, like in the JNI API, PyPy
should use plain pinned pointers. The pinning would not be apparent in
the type, but that should be fine I guess.
Problems arise when PyPy-aware code calls code which still uses the
refcounting API. It is mostly safe to ignore the refcounting (even
decreases) for local references, but I'm unsure about persistent
references, even if it's probably still the best solution, so that the
PyPy-aware code handles the lifecycle by itself.

On Thu, Aug 12, 2010 at 11:25, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Maciej Fijalkowski, 12.08.2010 10:05:
>> On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:

> If you only use it to call into non-trivial Cython code (e.g. some heavy
> calculations on NumPy tables), the call overhead should be mostly
> negligible, maybe even close to that in CPython. You could even provide
> some kind of fast-path to 'cpdef' functions (i.e. functions that are
> callable from both C and Python) and 'api' functions (which are currently
> exported at the module API level using the PyCapsule mechanism). That would
> reduce the call overhead to that of a C call.

>> but it's also unjitable. This means that to JIT, cpython
>> extension is like a black box which should not be touched.

> Well, unless both sides learn about each other, that is. It won't
> necessarily impact the JIT, but then again, a JIT usually won't have a
> noticeable impact on the performance of Cython code anyway.

Call overhead is not the biggest problem, I guess (well, if it's
bigger than that in C, it might be); it's IMHO the minor problem when
you can't inline. Inlining is important because it allows to do more
optimizations on the combined code. Now, it might or might not apply
to your typical use cases (present and future), you should just keep
this issue in mind, too. Whenever you say "If you only use it to call
into non-trivial Cython code", you imply that some kind of functional
abstraction, the one where you write short functions, such as
accessors, are not efficiently supported.

For instance, if you call two functions, each containing a parallel
for loops, fusing the loops requires inlining the functions to expose
the loops.
Inlining accessors (getters and setters) allows to recognize that they
often don't need to be called over and over again, i.e., common
subexpression elimination, which you can't do on a normal (impure)
function.

To make a particularly dramatic example (since it comes from C) of a
quadratic-to-linear optimization: a loop like
for (i = 0; i < strlen(s); i++) {
  //do something on s without modifying it
}

takes quadratic time, because strlen takes linear time and is called
at each loop. Can the optimizer fix this? The simplest way for it is
to inline everything, then it could notice that calculating strlen
only once is safe. In C with GCC extensions, one could annotate strlen
as pure, and use functions which take s as a const parameter (but I'm
unsure if it actually works). In Python (and even in Java), anything
such should work without annotations.

Of course, one can't rely on this quadratic-linear optimization unless
it's guaranteed to work (like tail call elimination), so I wouldn't do
it in this case; this point relates to the wider issue of unreliable
optimizations and "sufficiently smart compilers", better discussed at
http://prog21.dadgum.com/40.html (not mine).
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From jbaker at zyasoft.com  Thu Aug 12 17:41:39 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Thu, 12 Aug 2010 09:41:39 -0600
Subject: [pypy-dev] What can Cython do for PyPy?
In-Reply-To: <i40elu$o4h$1@dough.gmane.org>
References: <i405h5$o06$1@dough.gmane.org>
	<AANLkTi=3aNjJ+2vd2==64gqvygA45+0AKDWwt40NCBt4@mail.gmail.com>
	<i40elu$o4h$1@dough.gmane.org>
Message-ID: <AANLkTinw3cbtL0DgTeVgv+328JuUi8LS1wE4h2ioSW4W@mail.gmail.com>

[crossposting to jython-dev]

Because of some conversations I had with Maciej (mostly at Folsom Coffee in
Boulder :) ), we are considering adding support for the CPython C-Extension
API for Jython, modeling what has already been done in PyPy and IronPython.
Although I think it may make a lot of sense to port NumPy to Java, and have
argued for it in the past, being pragmatic suggests it's better to work with
the tide of NumPy/Cython than against it. Also, this can bring in a large
swath of existing libraries to work with Jython, including those coded
against SWIG, at the cost that it will not run under most security manager
policies. I think that's a reasonable tradeoff.

Similar concerns that Maciej raises apply to Jython. No Java JIT will inline
such native code, marshaling from the Java domain to the native one will be
expensive, etc. But this is (mostly) true of Jython today, from Python code
to Java (although invokedynamic will at least reduce some of those costs).
But users can still take advantage of Java to achieve much better
performance from Jython, if they are careful about structuring the execution
of their code. At the end of the day, Jython to C code, including that
produced by Cython should see a similar performance profile to CPython to C
code, as long as they don't hammer the INCREF/DECREF *functions*. (JRuby is
implementing something similar, and we probably can borrow their
"refcounting" support.) But of course that's exactly what one needs to avoid
to write performant extension code anyway in CPython, at least if it's to be
multithreaded.

One interesting part of this discussion is whether we can support lock
eliding. This is one part of JIT inlining that you don't want to give up for
multithreaded performance. Rather than having C code callback into Java to
release the GIL (which is only global for such C code!), it would be better
to have a marker on the C code that allows for immediate release, or perhaps
some other inlinable Java stub. I could imagine this could be readily
supported by Cython (and perhaps already is).

Lastly, I want to emphasize again that if/when Jython adds support for the C
extension API, the "GIL" and "refcounting" support will only be for such C
code! We like our concurrency support and we are not giving it up :)

- Jim

On Thu, Aug 12, 2010 at 3:25 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> Maciej Fijalkowski, 12.08.2010 10:05:
> > On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:
> >> there has recently been a move towards a .NET/IronPython port of Cython,
> >> mostly driven by the need for a fast NumPy port. During the related
> >> discussion, the question came up how much it would take to let Cython
> also
> >> target other runtimes, including PyPy.
> >>
> >> Given that PyPy already has a CPython C-API compatibility layer, I doubt
> >> that it would be hard to enable that. With my limited knowledge about
> the
> >> internals of that layer, I guess the question thus becomes: is there
> >> anything Cython could do to the C code it generates that would make the
> >> Cython generated extension modules run faster/better/safer on PyPy than
> >> they would currently? I never tried to make a Cython module actually run
> on
> >> PyPy (simply because I don't use PyPy), but I have my doubts that they'd
> >> run perfectly out of the box. While generally portable, I'm pretty sure
> the
> >> C code relies on some specific internals of CPython that PyPy can't
> easily
> >> (or efficiently) provide.
> >
> > CPython extension compatibility layer is in alpha at best. I heavily
> > doubt that anything would run out of the box. However, this is a
> > cpython compatiblity layer anyway, it's not meant to be used as a long
> > term solutions. First of all it's inneficient (and unclear if will
> > ever be)
>
> If you only use it to call into non-trivial Cython code (e.g. some heavy
> calculations on NumPy tables), the call overhead should be mostly
> negligible, maybe even close to that in CPython. You could even provide
> some kind of fast-path to 'cpdef' functions (i.e. functions that are
> callable from both C and Python) and 'api' functions (which are currently
> exported at the module API level using the PyCapsule mechanism). That would
> reduce the call overhead to that of a C call.
>
> Then, a lot of Cython code doesn't do much ref-counting and the like but
> simply runs in plain C. So, often enough, there won't be that much overhead
> involved in the code itself either, especially in tight loops where users
> prune away all CPython interaction anyway.
>
>
> > but it's also unjitable. This means that to JIT, cpython
> > extension is like a black box which should not be touched.
>
> Well, unless both sides learn about each other, that is. It won't
> necessarily impact the JIT, but then again, a JIT usually won't have a
> noticeable impact on the performance of Cython code anyway.
>
>
> > Also, several concepts, like refcounting are completely alien to pypy
> > and emulated.
>
> Sure. That's why I asked if there is anything that Cython can help to
> improve here. For example, the code it generates for INCREF/DECREF
> operations is not only configurable at the C preprocessor level.
>
>
> > For example for numpy, I think a rewrite is necessary to make it fast
> > (and as experiments have shown, it's possible to make it really fast),
> > so I would not worry about using cython for speeding things up.
>
> This isn't only about making things fast when being rewritten. This is also
> about accessing and reusing existing code in a new environment. Cython is
> becoming increasingly popular in the numerics community, and a lot of
> Cython code is being written as we speak, not only in the SciPy/NumPy
> environment. People even find it attractive enough to start rewriting their
> CPython extension modules (most often library wrappers) from C in Cython,
> both for performance and TCO reasons.
>
>
> > There is another usecase for using cython for providing access to C
> > libraries. This is a bit harder question and I don't have a good
> > answer for that, but maybe cpython compatibility layer would be good
> > enough in this case? I can't see how Cython can produce a "native" C
> > code instead of CPython C code without some major effort.
>
> Native (standalone) C code isn't the goal, just something that adapts well
> to what PyPy can provide as a CPython compatibility layer. If Cython
> modules work across independent Python implementations, that would be the
> most simple way by far to make lots of them available cross-platform, thus
> making it a lot simpler to switch between different implementations.
>
> Stefan
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100812/e00f0940/attachment.html>

From anto.cuni at gmail.com  Thu Aug 12 20:31:06 2010
From: anto.cuni at gmail.com (Antonio Cuni)
Date: Thu, 12 Aug 2010 20:31:06 +0200
Subject: [pypy-dev] [pypy-svn] r76608 - in
 pypy/branch/jit-bounds/pypy/jit/metainterp: . test
In-Reply-To: <20100812170214.45AC5282B9E@codespeak.net>
References: <20100812170214.45AC5282B9E@codespeak.net>
Message-ID: <4C643DEA.9020809@gmail.com>

On 12/08/10 19:02, hakanardo at codespeak.net wrote:

> +    def boundint_gt(self, val):
> +        if val is None: return
> +        self.minint = val + 1

what happens if val == sys.maxint?

ciao,
Anto


From kevinar18 at hotmail.com  Fri Aug 13 01:42:54 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 12 Aug 2010 19:42:54 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: <AANLkTi=x=L_tO5OowjZkfD1rNw1tY9Cr1NdK1kh1=xM1@mail.gmail.com>
References: <SNT110-W47285BE7903AF23E1C71C6AAA70@phx.gbl>,
	<AANLkTikzdrSxG00-GUghR6bBTKiSrMBq+LZcFr_B-9xO@mail.gmail.com>,
	<SNT110-W6556C5CFD4F13E48D182A9AAA90@phx.gbl>,
	<AANLkTi=74uW3dhvrp2Ti2wA+x9ni7L9jZaa69EG4HDCO@mail.gmail.com>,
	<SNT110-W393FEE8C72F62C59AA86BAAA910@phx.gbl>,
	<AANLkTi=x=L_tO5OowjZkfD1rNw1tY9Cr1NdK1kh1=xM1@mail.gmail.com>
Message-ID: <SNT110-W30CE421CB01A3D07C92F8DAA970@phx.gbl>


Sorry for not gettin back to you sooner.


I don't mind replying to the mailing list unless it annoys someone? Maybe some people could be interested by this discussion. 


You have a lot of questions! :) My answers are inline.


* Message passing
When you create a tasklet, you assign a set number of queues or streams to it (it can have many) and whether they extract data from them or write to them (they can only either extract or write to it as noted above). The tasklet's global namespace has access to these queues or streams and can extract or add data to them.

In my case, I look at message passing from the perspective of the tasklet. A tasklet can either be assigned a certain number of "in ports" and a certain number of "out ports." In this case the "in ports" are the .read() end of a queue or stream and the "out ports" are the .send() part of a queue or stream.


Sorry, I don't really understand what you're trying to explain here. Maybe an example could be helpful? :)


* Scheduler
For the scheduler, I would need to control when a tasklet runs. Currently, I am thinking that I would look at all the "in ports" that a tasklet has and make sure each one has some data. Only then would the tasklet be scheduled to run by the scheduler.


Couldn't all those ports (channels) be read one at a time, then the processing could be done? I don't exactly see the need to play with the scheduler. Channels are blocking. A tasklet will be anyway unscheduled when it tries to read on a channel in which no data is available.
 
http://www.jpaulmorrison.com/fbp/concepts.htm
Figure 3.6 and Figure 3.7 are a good example.
Let's say Figure 3.7 is the tasklet (the one with a local namespace and no access to global memory or memory in other tasklets).
IN, ACC, REJ are pointers to a shared memory location (from an implementation standpoint).
IN, ACC, REJ are either a queue or buffer/pipe/steam (from the perspective of the programmer).
The tasklet can only read/extract data from IN.
The tasklet can only write to ACC and REJ.
 
> Couldn't all those ports (channels) be read one at a time, then the processing could be done?
Not sure exactly, what you mean, but as shown in Figure 3.7, different parts of code will read or write to different ports at different times.
> A tasklet will be anyway unscheduled when it tries to read on a channel in which no data is available.
Good idea.  If there's no data to read, the tasklet can yield. ... but I need to know when the tasklet can be put back into the scheduler queue
 
Then again, I don't know how I will want to do the scheduler... and would like the low level primitives to explore different styles.
 
 
Anyways, at this point, I guess this whole discussion is not that important.  I should probably make something simpler for now just to try things out.  Then maybe I'll know if I want to even bother working on something better.   However, if you would like me to keep you up to date, I can contact you via email a few months from now.  (Let me know and I'll give you a different email to use). 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100812/a4abdb4d/attachment.html>

From bhartsho at yahoo.com  Fri Aug 13 03:42:23 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Thu, 12 Aug 2010 18:42:23 -0700 (PDT)
Subject: [pypy-dev] Wrapping C, void pointer
Message-ID: <986277.65536.qm@web114019.mail.gq1.yahoo.com>

I am wrapping PortAudio for RPython.  Following the source code in RSDL as a guide i can clearly see what to do with constants, structs, functions and so on.  So far so good until i reached something new in PortAudio i am not sure how to deal with, a void typedef and a void pointer.  In the RSDL example pointers were defined by Ptr = lltype.Ptr(lltype.ForwardReference()), then in the CConfig class the struct was defined and rffi_platform.configure parses, finally the Ptr is told TO become the type - given the output of platform.configure.  How do we deal with this situation from PortAudio?

# from portaudio.h
typedef void PaStream;
PaError Pa_OpenStream( PaStream** stream,
                       const PaStreamParameters *inputParameters,
                       const PaStreamParameters *outputParameters,
                       double sampleRate,
                       unsigned long framesPerBuffer,
                       PaStreamFlags streamFlags,
                       PaStreamCallback *streamCallback,
                       void *userData );

###############################
I have tried the following but it fails when i try to malloc the void pointer.

OpenDefaultStream = external(	'Pa_OpenDefaultStream', 
	[
		rffi.VOIDPP,	# PaStream** 
		rffi.INT, 		# numInputChannels
		rffi.INT, 		# numOutputChannels
		rffi.INT, 		# sampleFormat
		rffi.INT, 		# sampleRate
		rffi.INT, 		# framesPerBuffer
		rffi.INT,		#streamcallback
		rffi.VOIDP,		#userData
	],	
	rffi.INT
)
Stream = lltype.Void		#rffi.VOIDP
def test():
	print 'portaudio version %s' %GetVersion()
	assert Initialize() == 0		# paNoError = 0, error code is returned on init fail.

	stream = lltype.malloc(Stream, flavor='raw')
	try: ok = OpenDefaultStream( stream, 1, 1, Int16, 22050, FramesPerBufferUnspecified, 0 )
	finally: lltype.free(stream, flavor='raw')
	Terminate()


-brett


From arigo at tunes.org  Fri Aug 13 10:39:49 2010
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 13 Aug 2010 10:39:49 +0200
Subject: [pypy-dev] Wrapping C, void pointer
In-Reply-To: <986277.65536.qm@web114019.mail.gq1.yahoo.com>
References: <986277.65536.qm@web114019.mail.gq1.yahoo.com>
Message-ID: <20100813083948.GA20768@code0.codespeak.net>

Hi Hart,

On Thu, Aug 12, 2010 at 06:42:23PM -0700, Hart's Antler wrote:
> I am wrapping PortAudio for RPython.

Why?  Writing it in standard ctypes would give really bad performance?


Armin


From bhartsho at yahoo.com  Fri Aug 13 15:02:12 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Fri, 13 Aug 2010 06:02:12 -0700 (PDT)
Subject: [pypy-dev] Wrapping C, void pointer
In-Reply-To: <20100813083948.GA20768@code0.codespeak.net>
Message-ID: <634594.45561.qm@web114019.mail.gq1.yahoo.com>

Hi Armin, i wanted something faster than ctypes, i think thats why Hubert Pham used the Python C API when doing pyaudio before, also i want to do DSP on the samples and want to option to do as many effects as possible in real-time.
I figured out my problem, rpyportaudio is on google code now, http://code.google.com/p/rpyportaudio/


--- On Fri, 8/13/10, Armin Rigo <arigo at tunes.org> wrote:

> From: Armin Rigo <arigo at tunes.org>
> Subject: Re: [pypy-dev] Wrapping C, void pointer
> To: "Hart's Antler" <bhartsho at yahoo.com>
> Cc: pypy-dev at codespeak.net
> Date: Friday, 13 August, 2010, 1:39 AM
> Hi Hart,
> 
> On Thu, Aug 12, 2010 at 06:42:23PM -0700, Hart's Antler
> wrote:
> > I am wrapping PortAudio for RPython.
> 
> Why?? Writing it in standard ctypes would give really
> bad performance?
> 
> 
> Armin
> 


From andrewfr_ice at yahoo.com  Fri Aug 13 18:52:14 2010
From: andrewfr_ice at yahoo.com (Andrew Francis)
Date: Fri, 13 Aug 2010 09:52:14 -0700 (PDT)
Subject: [pypy-dev] pypy-dev Digest, Vol 361, Issue 5
In-Reply-To: <mailman.2663.1281656577.6194.pypy-dev@codespeak.net>
Message-ID: <66172.2718.qm@web120712.mail.ne1.yahoo.com>

Hi Kevin:

Message: 4
Date: Thu, 12 Aug 2010 19:42:54 -0400
From: Kevin Ar18 <kevinar18 at hotmail.com>
Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared
    memory message passing?
To: <pypy-dev at codespeak.net>
Message-ID: <SNT110-W30CE421CB01A3D07C92F8DAA970 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"


>I don't mind replying to the mailing list unless it annoys someone? Maybe >some people could be interested by this discussion.

I am finding it a bit difficult to follow this thread. I am not sure who is saying what. Also I don't know if you are talking about an entirely new system or the stackless.py module.

>In my case, I look at message passing from the perspective of the >tasklet. A tasklet can either be assigned a certain number of "in ports" >and a certain number of "out ports." In this case the "in ports" are the >.read() end of a queue or stream and the "out ports" are the .send() part >of a queue or stream.

A part of the model that Stackless uses is that tasklets have channels.
Channels have send() and receive() operations. 

>For the scheduler, I would need to control when a tasklet runs. >Currently, I am thinking that I would look at all the "in ports" that a >tasklet has and make sure each one has some data. Only then would the >tasklet be scheduled to run by the scheduler.

The current scheduler already does this. However there are no in or out ports, just operations that can proceed.

>Couldn't all those ports (channels) be read one at a time, then the >processing could be done? 

If you are using stackless.py - the tasklet will block if it encounters
a channel with no target on the other side. I wrote a select() function that allows monitoring on multiple channels.

>Good idea.  If there's no data to read, the tasklet can yield. ... but I >need to know when the tasklet can be put back into the scheduler queue

I don't want to toot my horn but I gave a talk that covers how rendez-vous semantics works at EuroPython: http://andrewfr.wordpress.com/2010/07/24/prototyping-gos-select-and-beyond/

Cheers,
Andrew


From bhartsho at yahoo.com  Sat Aug 14 03:51:00 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Fri, 13 Aug 2010 18:51:00 -0700 (PDT)
Subject: [pypy-dev] RPython function callback from C
Message-ID: <662548.18463.qm@web114020.mail.gq1.yahoo.com>

I have the PortAudio blocking API working, simple reading and writing to the sound card works.  PortAudio also has an async API where samples are fed to a callback as they stream in.  But i'm not sure how to define a RPython function that will be called as a callback from C, is this even possible?  I see some references in the source of rffi that seems to suggest it is possible.  Full source code is here http://pastebin.com/6YHbT7CU


I'm passing the callback like this:

def stream_callback( *args ):
	print 'stream callback'
	return 0		# 0=continue, 1=complete, 2=abort

stream_callback_ptr = rffi.CCallback([], rffi.INT)

OpenDefaultStream = rffi.llexternal(	'Pa_OpenDefaultStream', 
	[
		StreamRefPtr,		# PaStream** 
		rffi.INT, 		# numInputChannels
		rffi.INT, 		# numOutputChannels
		rffi.INT, 		# sampleFormat
		rffi.DOUBLE, 		# double sampleRate
		rffi.INT, 		# unsigned long framesPerBuffer
		#rffi.VOIDP,		#PaStreamCallback *streamCallback
		stream_callback_ptr,
		rffi.VOIDP,		#void *userData
	],	
	rffi.INT,		# return
	compilation_info=eci,
	_callable=stream_callback
)

entrypoint():
   ...
   callback = lltype.nullptr( stream_callback_ptr.TO )
   ok = OpenDefaultStream( streamptr, 2, 2, Int16, 22050.0, 512, callback, userdata )


From kevinar18 at hotmail.com  Sat Aug 14 05:29:15 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Fri, 13 Aug 2010 23:29:15 -0400
Subject: [pypy-dev] ongoing microthread discussions
In-Reply-To: <66172.2718.qm@web120712.mail.ne1.yahoo.com>
References: <mailman.2663.1281656577.6194.pypy-dev@codespeak.net>,
	<66172.2718.qm@web120712.mail.ne1.yahoo.com>
Message-ID: <SNT110-W28AB4EA16C988D3BFBBC3DAA990@phx.gbl>


> >I don't mind replying to the mailing list unless it annoys someone? Maybe >some people could be interested by this discussion.
> 
> I am finding it a bit difficult to follow this thread. I am not sure who is saying what. Also I don't know if you are talking about an entirely new system or the stackless.py module.
An entirely new system/way of doing things -- meaning I don't think the stackless style would fit.
 
Originally, I was hoping for some way to achieve what I want in Python across multiple cores, but I'm finding there is no such primitives to do that effectively.  I know the basics of how I would do it in a lower level language.
 
Yes, there are many different topics that this brought up.  Here's a summary:
* I wanted to work on a different way of doing things (different than stackless)... but I needed lower level primitives that allowed me to pass data back and forth between threads using shared memory queues or pipes (instead of the current method that copies the data back and forth)
* I then asked about the difficulty in doing some form of limited shared memory (one that wouldn't involve a GIL overhaul)
* A branch of the discussion involved people discuss various locking problems that might cause...
* The author of Kamaelia posted a message and we had a brief discussion down that road.  (His project is very similar to what I want to do.)
* Gabriel mentioned his project and we had a brief discussion.  His project has some similarities ... but still is probably too different for my needs, but maybe would be very interesting to other people here.
* In one of the emails, I brought up a possible solution to offering shared memory "message passing" that would not require locks of locking issues... but it really is too much for me to get involved with now.
 
... and I guess by now the discussion has pretty much died off as there was really nothing more.... 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100813/55e97701/attachment.html>

From jbaker at zyasoft.com  Sat Aug 14 08:13:26 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Sat, 14 Aug 2010 00:13:26 -0600
Subject: [pypy-dev] ongoing microthread discussions
In-Reply-To: <SNT110-W28AB4EA16C988D3BFBBC3DAA990@phx.gbl>
References: <mailman.2663.1281656577.6194.pypy-dev@codespeak.net>
	<66172.2718.qm@web120712.mail.ne1.yahoo.com>
	<SNT110-W28AB4EA16C988D3BFBBC3DAA990@phx.gbl>
Message-ID: <AANLkTimzGEU8HpOrY6j5Nf=9Wy4HUUZ6i6VAUhmd3HdD@mail.gmail.com>

Kevin,

You may want to broaden your candidates. Jython already supports multiple
cores with no GIL and shared memory with well-defined memory semantics
derived directly from Java's memory model (and compatible with the informal
memory model that we see in CPython). Because JRuby needs it for efficient
support of Ruby 1.9 generators, which are more general than Python's
(non-nested yields), there has been substantial attention paid to the MLVM
coroutine support which has demonstrated 1M+ microthread scalability in a
single JVM process.

It would be amazing if someone spent some time looking at this in Jython.

- Jim

On Fri, Aug 13, 2010 at 9:29 PM, Kevin Ar18 <kevinar18 at hotmail.com> wrote:

>  > >I don't mind replying to the mailing list unless it annoys someone?
> Maybe >some people could be interested by this discussion.
> >
> > I am finding it a bit difficult to follow this thread. I am not sure who
> is saying what. Also I don't know if you are talking about an entirely new
> system or the stackless.py module.
> An entirely new system/way of doing things -- meaning I don't think the
> stackless style would fit.
>
> Originally, I was hoping for some way to achieve what I want in Python
> across multiple cores, but I'm finding there is no such primitives to do
> that effectively.  I know the basics of how I would do it in a lower level
> language.
>
> Yes, there are many different topics that this brought up.  Here's a
> summary:
> * I wanted to work on a different way of doing things (different than
> stackless)... but I needed lower level primitives that allowed me to pass
> data back and forth between threads using shared memory queues or pipes
> (instead of the current method that copies the data back and forth)
> * I then asked about the difficulty in doing some form of limited shared
> memory (one that wouldn't involve a GIL overhaul)
> * A branch of the discussion involved people discuss various locking
> problems that might cause...
> * The author of Kamaelia posted a message and we had a brief discussion
> down that road.  (His project is very similar to what I want to do.)
> * Gabriel mentioned his project and we had a brief discussion.  His project
> has some similarities ... but still is probably too different for my needs,
> but maybe would be very interesting to other people here.
> * In one of the emails, I brought up a possible solution to offering shared
> memory "message passing" that would not require locks of locking issues...
> but it really is too much for me to get involved with now.
>
> ... and I guess by now the discussion has pretty much died off as there was
> really nothing more....
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100814/f1d03659/attachment.html>

From arigo at tunes.org  Sat Aug 14 15:50:02 2010
From: arigo at tunes.org (Armin Rigo)
Date: Sat, 14 Aug 2010 15:50:02 +0200
Subject: [pypy-dev] PyPy speed center not updating any more
Message-ID: <20100814135002.GA1941@code0.codespeak.net>

Hi all,

The PyPy speed center does not display any update more recent than July 29.
The buildbot infrastructure correctly puts them into files
codespeak.net:~buildmaster/bench_results/REV.json, but the web site at
http://speed.pypy.org/ does not get updated.

Help please!


A bientot,

Armin.


From arigo at tunes.org  Sat Aug 14 16:23:21 2010
From: arigo at tunes.org (Armin Rigo)
Date: Sat, 14 Aug 2010 16:23:21 +0200
Subject: [pypy-dev] PyPy speed center not updating any more
In-Reply-To: <20100814135002.GA1941@code0.codespeak.net>
References: <20100814135002.GA1941@code0.codespeak.net>
Message-ID: <20100814142321.GA4071@code0.codespeak.net>

Hi all,

On Sat, Aug 14, 2010 at 03:50:02PM +0200, Armin Rigo wrote:
> The PyPy speed center does not display any update more recent than July 29.

Wrong (thanks Antonio).  It's only the twisted_web benchmark that stops
at July 29; it was certainly removed at that date.  For the others it
works as expected.

The most recent results of today (76624) have been run on the
kill-caninline branch.


Armin.


From kevinar18 at hotmail.com  Sun Aug 15 02:24:20 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Sat, 14 Aug 2010 20:24:20 -0400
Subject: [pypy-dev] ongoing microthread discussions
In-Reply-To: <AANLkTimzGEU8HpOrY6j5Nf=9Wy4HUUZ6i6VAUhmd3HdD@mail.gmail.com>
References: <mailman.2663.1281656577.6194.pypy-dev@codespeak.net>,
	<66172.2718.qm@web120712.mail.ne1.yahoo.com>
	<SNT110-W28AB4EA16C988D3BFBBC3DAA990@phx.gbl>,
	<AANLkTimzGEU8HpOrY6j5Nf=9Wy4HUUZ6i6VAUhmd3HdD@mail.gmail.com>
Message-ID: <SNT110-W606771A1A13E3B90930CAEAA9A0@phx.gbl>


You may want to broaden your candidates. Jython already supports multiple cores with no GIL and shared memory with well-defined memory semantics derived directly from Java's memory model (and compatible with the informal memory model that we see in CPython). Because JRuby needs it for efficient support of Ruby 1.9 generators, which are more general than Python's (non-nested yields), there has been substantial attention paid to the MLVM coroutine support which has demonstrated 1M+ microthread scalability in a single JVM process.


It would be amazing if someone spent some time looking at this in Jython.
 
For me, anything based on the Java VM or copyleft code it out of question.  However, you are quite right in that it is not necessary that I use PyPy.  For example, if Unladen Swallow had the primitives I needed, that would be great too.
 
As a side note, PyPy does have two advantages: speed and that it is coded in RPython: which might even allow me to just hack PyPy itself at some point. :)
 
BTW, thanks for the suggestion.  Now that you brought up the topic of different implementations, I should probably check on what is going on in regards to Unladen Swallow, etc.... 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100814/483871ec/attachment.html>

From jbaker at zyasoft.com  Sun Aug 15 16:31:38 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Sun, 15 Aug 2010 08:31:38 -0600
Subject: [pypy-dev] ongoing microthread discussions
In-Reply-To: <SNT110-W606771A1A13E3B90930CAEAA9A0@phx.gbl>
References: <mailman.2663.1281656577.6194.pypy-dev@codespeak.net>
	<66172.2718.qm@web120712.mail.ne1.yahoo.com>
	<SNT110-W28AB4EA16C988D3BFBBC3DAA990@phx.gbl>
	<AANLkTimzGEU8HpOrY6j5Nf=9Wy4HUUZ6i6VAUhmd3HdD@mail.gmail.com>
	<SNT110-W606771A1A13E3B90930CAEAA9A0@phx.gbl>
Message-ID: <AANLkTi=VzYDVOBFjmF89Xjz31rLyDp=tFj03+8F42P1Q@mail.gmail.com>

To clarify, there are numerous implementations of the JVM that are not
copyleft, such as Apache Harmony. Of course the MLVM work I
cited<http://classparser.blogspot.com/2010/04/jruby-coroutines-really-fast.html>is
not one of them.

Jython itself is licensed <http://www.jython.org/license.html> under the
Python Software License.

On Sat, Aug 14, 2010 at 6:24 PM, Kevin Ar18 <kevinar18 at hotmail.com> wrote:

>  You may want to broaden your candidates. Jython already supports multiple
> cores with no GIL and shared memory with well-defined memory semantics
> derived directly from Java's memory model (and compatible with the informal
> memory model that we see in CPython). Because JRuby needs it for efficient
> support of Ruby 1.9 generators, which are more general than Python's
> (non-nested yields), there has been substantial attention paid to the MLVM
> coroutine support which has demonstrated 1M+ microthread scalability in a
> single JVM process.
>
> It would be amazing if someone spent some time looking at this in Jython.
>
>
> For me, anything based on the Java VM or copyleft code it out of question.
> However, you are quite right in that it is not necessary that I use PyPy.
> For example, if Unladen Swallow had the primitives I needed, that would be
> great too.
>
> As a side note, PyPy does have two advantages: speed and that it is coded
> in RPython: which might even allow me to just hack PyPy itself at some
> point. :)
>
> BTW, thanks for the suggestion.  Now that you brought up the topic of
> different implementations, I should probably check on what is going on in
> regards to Unladen Swallow, etc....
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100815/1011e24a/attachment.html>

From amauryfa at gmail.com  Sun Aug 15 21:32:51 2010
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Sun, 15 Aug 2010 21:32:51 +0200
Subject: [pypy-dev] RPython function callback from C
In-Reply-To: <662548.18463.qm@web114020.mail.gq1.yahoo.com>
References: <662548.18463.qm@web114020.mail.gq1.yahoo.com>
Message-ID: <AANLkTimALrpb7nxPtVJfML35H4As0Yt5BASBsDwDR2mM@mail.gmail.com>

Hi,

Le 14 ao?t 2010 03:51:00 UTC+2, Hart's Antler <bhartsho at yahoo.com> a ?crit :
> I have the PortAudio blocking API working, simple reading and writing to the
> sound card works. ?PortAudio also has an async API where samples are fed to
> a callback as they stream in. ?But i'm not sure how to define a RPython
> function that will be called as a callback from C, is this even possible? ?I
> see some references in the source of rffi that seems to suggest it is
> possible.

Yes this is possible.
See an example in pypy/rpython/lltypesystem/test/test_ll2ctypes.py
in the function test_qsort_callback().

Why are you passing _callable=stream_callback?
It should be enough to pass stream_callback directly as a function argument.

-- 
Amaury Forgeot d'Arc


From ademan555 at gmail.com  Mon Aug 16 01:18:10 2010
From: ademan555 at gmail.com (Dan Roberts)
Date: Sun, 15 Aug 2010 16:18:10 -0700
Subject: [pypy-dev] JIT Failure on lltype.Array access
Message-ID: <AANLkTikUyY9=JTPd5awLp-mDLd5PTTdxGSOs+nQXWnyt@mail.gmail.com>

As best I can tell, the JIT cannot handle my code properly, it corrupts
memory and returns 0.0 for float arrays. I don't know whether the true
problem is in my code or the JIT, but I need to get this resolved quickly.

I know the JIT and my code are interacting badly because py.py works fine
(though slow) and translated pypy-c with jit and --jit threshold=9999999
both work fine.

Here's what I've tried to resolve the issue:
Removing my _immutable_fields_ hints.
Hand implementing bh_{get,set}arrayitem_raw_{r,i,f} (though I don't know my
implementation was right, I simply copied the gc version and removed the
first offset (since raw arrays have no header right? Although I expect that
the gc version would have simply gotten 0 for the header size... I tried it
anyways)

A few thoughts:
descr.py alludes to a FloatArrayDescr which I never raw defined
Could the asm backend be part of the problem? Rather than the code in
llmodel.py?

Unfortunately I'm ill equipped to resolve this issue, so any help is
appreciated (I'm on my phone but I'll happily furnish exact errors upon
request)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100815/b57c1dfb/attachment.html>

From tobami at googlemail.com  Mon Aug 16 08:19:33 2010
From: tobami at googlemail.com (Miquel Torres)
Date: Mon, 16 Aug 2010 08:19:33 +0200
Subject: [pypy-dev] PyPy speed center not updating any more
In-Reply-To: <20100814142321.GA4071@code0.codespeak.net>
References: <20100814135002.GA1941@code0.codespeak.net>
	<20100814142321.GA4071@code0.codespeak.net>
Message-ID: <AANLkTinCapwXRCCeP7XJFcdg1Ti_KH4GUSgjhygeH+v2@mail.gmail.com>

Hi Armin,

are all results going to be run on a branch now?.

If you run results on a branch, but don't change the config on
codespeed, the commit logs won't work because it will try to pull them
from trunk


2010/8/14 Armin Rigo <arigo at tunes.org>:
> Hi all,
>
> On Sat, Aug 14, 2010 at 03:50:02PM +0200, Armin Rigo wrote:
>> The PyPy speed center does not display any update more recent than July 29.
>
> Wrong (thanks Antonio). ?It's only the twisted_web benchmark that stops
> at July 29; it was certainly remov ed at that date. ?For the others it
> works as expected.
>
> The most recent results of today (76624) have been run on the
> kill-caninline branch.
>
>
> Armin.
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


From fijall at gmail.com  Mon Aug 16 09:00:52 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Mon, 16 Aug 2010 09:00:52 +0200
Subject: [pypy-dev] PyPy speed center not updating any more
In-Reply-To: <AANLkTinCapwXRCCeP7XJFcdg1Ti_KH4GUSgjhygeH+v2@mail.gmail.com>
References: <20100814135002.GA1941@code0.codespeak.net>
	<20100814142321.GA4071@code0.codespeak.net>
	<AANLkTinCapwXRCCeP7XJFcdg1Ti_KH4GUSgjhygeH+v2@mail.gmail.com>
Message-ID: <AANLkTinHZYS4YiyH5bTmujQHWKv5JN2TzySUjri7PZKf@mail.gmail.com>

I disabled twisted_web because of run out of TCP connection problem.
Regarding branches - how can we have branches visible with trunk
side-by-side, submit that as a different interpreter?

On Mon, Aug 16, 2010 at 8:19 AM, Miquel Torres <tobami at googlemail.com> wrote:
> Hi Armin,
>
> are all results going to be run on a branch now?.
>
> If you run results on a branch, but don't change the config on
> codespeed, the commit logs won't work because it will try to pull them
> from trunk
>
>
> 2010/8/14 Armin Rigo <arigo at tunes.org>:
>> Hi all,
>>
>> On Sat, Aug 14, 2010 at 03:50:02PM +0200, Armin Rigo wrote:
>>> The PyPy speed center does not display any update more recent than July 29.
>>
>> Wrong (thanks Antonio). ?It's only the twisted_web benchmark that stops
>> at July 29; it was certainly remov ed at that date. ?For the others it
>> works as expected.
>>
>> The most recent results of today (76624) have been run on the
>> kill-caninline branch.
>>
>>
>> Armin.
>> _______________________________________________
>> pypy-dev at codespeak.net
>> http://codespeak.net/mailman/listinfo/pypy-dev
>>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


From arigo at tunes.org  Mon Aug 16 15:06:48 2010
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 16 Aug 2010 15:06:48 +0200
Subject: [pypy-dev] PyPy speed center not updating any more
In-Reply-To: <AANLkTinCapwXRCCeP7XJFcdg1Ti_KH4GUSgjhygeH+v2@mail.gmail.com>
References: <20100814135002.GA1941@code0.codespeak.net>
	<20100814142321.GA4071@code0.codespeak.net>
	<AANLkTinCapwXRCCeP7XJFcdg1Ti_KH4GUSgjhygeH+v2@mail.gmail.com>
Message-ID: <20100816130648.GA15483@code0.codespeak.net>

Hi Miquel,

On Mon, Aug 16, 2010 at 08:19:33AM +0200, Miquel Torres wrote:
> are all results going to be run on a branch now?.

No no, I just ran manually twice on a branch.


A bientot,

Armin.


From arigo at tunes.org  Mon Aug 16 15:10:33 2010
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 16 Aug 2010 15:10:33 +0200
Subject: [pypy-dev] JIT Failure on lltype.Array access
In-Reply-To: <AANLkTikUyY9=JTPd5awLp-mDLd5PTTdxGSOs+nQXWnyt@mail.gmail.com>
References: <AANLkTikUyY9=JTPd5awLp-mDLd5PTTdxGSOs+nQXWnyt@mail.gmail.com>
Message-ID: <20100816131033.GB15483@code0.codespeak.net>

Hi Dan,

The issue was that the JIT was silently and incorrectly accepting the
type lltype.Array(), which is a non-GC but with-length-prefix array, and
it was (by mistake) considering it to be a GC array.  That's where the
errors come from.

Now the JIT explicitly refuses to work with such arrays.  As explained
on IRC, you need anyway in micronumpy to use the type rffi.CArray(),
which does not contain the length prefix.


A bientot,

Armin.


From tobami at googlemail.com  Mon Aug 16 16:15:39 2010
From: tobami at googlemail.com (Miquel Torres)
Date: Mon, 16 Aug 2010 16:15:39 +0200
Subject: [pypy-dev] PyPy speed center not updating any more
In-Reply-To: <20100816130648.GA15483@code0.codespeak.net>
References: <20100814135002.GA1941@code0.codespeak.net>
	<20100814142321.GA4071@code0.codespeak.net>
	<AANLkTinCapwXRCCeP7XJFcdg1Ti_KH4GUSgjhygeH+v2@mail.gmail.com>
	<20100816130648.GA15483@code0.codespeak.net>
Message-ID: <AANLkTikkCdEr3bERgPBdxTuHGf5c9qXykwsXaibHwZdF@mail.gmail.com>

Maciej: sorry, we had this issue pending for a long time already.

The best way would be to add a new project per branch. So instead of
project = 'PyPy'

save as
project = 'experimental_branchX'

then in the admin (the project entry will be created when the first
results are saved), choose whether to "track" the project (show or
hide in the changes view), and customize the commit log info (pull
logs from the corresponding subdir in svn instead of trunk).

Note: to avoid confusion, executables names are unique, so exe
(interpreter) names will need to be different as well (it could be
changed if needed)

Cheers,
Miquel


2010/8/16 Armin Rigo <arigo at tunes.org>:
> Hi Miquel,
>
> On Mon, Aug 16, 2010 at 08:19:33AM +0200, Miquel Torres wrote:
>> are all results going to be run on a branch now?.
>
> No no, I just ran manually twice on a branch.
>
>
> A bientot,
>
> Armin.
>


From bhartsho at yahoo.com  Thu Aug 19 03:25:02 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Wed, 18 Aug 2010 18:25:02 -0700 (PDT)
Subject: [pypy-dev] JIT'ed function performance degrades
Message-ID: <786138.15701.qm@web114011.mail.gq1.yahoo.com>

I am starting to learn how to use the JIT, and i'm confused why my function gets slower over time, twice as slow after running for a few minutes.  Using a virtualizable did speed up my code, but it still has the degrading performance problem.  I have yesterdays SVN and using 64bit with boehm.  I understand boehm is slower, but overall my JIT'ed function is many times slower than un-jitted, is this expected behavior from boehm?

code is here:
http://pastebin.com/9VGJHpNa


From sakesun at gmail.com  Thu Aug 19 06:25:42 2010
From: sakesun at gmail.com (sakesun roykiatisak)
Date: Thu, 19 Aug 2010 11:25:42 +0700
Subject: [pypy-dev] =?windows-1252?q?What=27s_wrong_with_=3E=3E=3E_open=28?=
	=?windows-1252?q?=92xxx=92=2C_=92w=92=29=2Ewrite=28=92stuff=92=29_?=
	=?windows-1252?q?=3F?=
Message-ID: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>

Hi,

 I encountered this quite a few times when learning pypy from internet
resources:
  the code like this

>>> open(?xxx?, ?w?).write(?stuff?)

This code is not working on pypy because it rely on CPython refcounting
behaviour.

I don't get it. Why ?  I thought the code should be similar to storing the
file object in temporary variable like this

>>> f = open('xxx', 'w')
>>> f.write('stuff')
>>> del f

Also, I've tried that with both Jython and IronPython and they all work
fine.

Why does this cause problem to pypy ?  Do I have to avoid writing code like
this in the future ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100819/c7f99278/attachment.html>

From sakesun at gmail.com  Thu Aug 19 06:49:36 2010
From: sakesun at gmail.com (sakesun roykiatisak)
Date: Thu, 19 Aug 2010 11:49:36 +0700
Subject: [pypy-dev]
	=?windows-1252?q?What=27s_wrong_with_=3E=3E=3E_open=28?=
	=?windows-1252?q?=92xxx=92=2C_=92w=92=29=2Ewrite=28=92stuff=92=29_?=
	=?windows-1252?q?=3F?=
In-Reply-To: <AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
Message-ID: <AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>

That's make sense.  I've tried on both IronPython and Jython with:

ipy -c "open(?xxx?, ?w?).write(?stuff?)"
jython -c "open(?xxx?, ?w?).write(?stuff?)"

When the interpreter terminate the file is closed. That's why it didn't
cause any problem.

Perhaps, I should always use "with" statement from now on.

>>> with open('xxx', 'w') as f: f.write('stuff')

Thanks

On Thu, Aug 19, 2010 at 11:40 AM, Aaron DeVore <aaron.devore at gmail.com>wrote:

> If I understand correctly, PyPy will garbage collect (and close) the
> file object at an indeterminate time. That time could be as long as
> until the program exits. Because CPython uses reference counting, it
> closes the file immediately after the file object goes out of scope.
>
> Of course, I may be entirely wrong.
>
> -Aaron DeVore
>
> On Wed, Aug 18, 2010 at 9:25 PM, sakesun roykiatisak <sakesun at gmail.com>
> wrote:
> > Hi,
> >  I encountered this quite a few times when learning pypy from internet
> > resources:
> >   the code like this
> >>>> open(?xxx?, ?w?).write(?stuff?)
> > This code is not working on pypy because it rely on CPython refcounting
> > behaviour.
> > I don't get it. Why ?  I thought the code should be similar to storing
> the
> > file object in temporary variable like this
> >>>> f = open('xxx', 'w')
> >>>> f.write('stuff')
> >>>> del f
> > Also, I've tried that with both Jython and IronPython and they all work
> > fine.
> > Why does this cause problem to pypy ?  Do I have to avoid writing code
> like
> > this in the future ?
> > _______________________________________________
> > pypy-dev at codespeak.net
> > http://codespeak.net/mailman/listinfo/pypy-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100819/58418987/attachment.html>

From sakesun at gmail.com  Thu Aug 19 07:07:47 2010
From: sakesun at gmail.com (sakesun roykiatisak)
Date: Thu, 19 Aug 2010 12:07:47 +0700
Subject: [pypy-dev]
	=?windows-1252?q?What=27s_wrong_with_=3E=3E=3E_open=28?=
	=?windows-1252?q?=92xxx=92=2C_=92w=92=29=2Ewrite=28=92stuff=92=29_?=
	=?windows-1252?q?=3F?=
In-Reply-To: <AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
	<AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>
Message-ID: <AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>

A little problem is that, "with" statement is yet to work in pypy.

:)


On Thu, Aug 19, 2010 at 11:49 AM, sakesun roykiatisak <sakesun at gmail.com>wrote:

> That's make sense.  I've tried on both IronPython and Jython with:
>
> ipy -c "open(?xxx?, ?w?).write(?stuff?)"
> jython -c "open(?xxx?, ?w?).write(?stuff?)"
>
> When the interpreter terminate the file is closed. That's why it didn't
> cause any problem.
>
> Perhaps, I should always use "with" statement from now on.
>
> >>> with open('xxx', 'w') as f: f.write('stuff')
>
> Thanks
>
> On Thu, Aug 19, 2010 at 11:40 AM, Aaron DeVore <aaron.devore at gmail.com>wrote:
>
>> If I understand correctly, PyPy will garbage collect (and close) the
>> file object at an indeterminate time. That time could be as long as
>> until the program exits. Because CPython uses reference counting, it
>> closes the file immediately after the file object goes out of scope.
>>
>> Of course, I may be entirely wrong.
>>
>> -Aaron DeVore
>>
>> On Wed, Aug 18, 2010 at 9:25 PM, sakesun roykiatisak <sakesun at gmail.com>
>> wrote:
>> > Hi,
>> >  I encountered this quite a few times when learning pypy from internet
>> > resources:
>> >   the code like this
>> >>>> open(?xxx?, ?w?).write(?stuff?)
>> > This code is not working on pypy because it rely on CPython refcounting
>> > behaviour.
>> > I don't get it. Why ?  I thought the code should be similar to storing
>> the
>> > file object in temporary variable like this
>> >>>> f = open('xxx', 'w')
>> >>>> f.write('stuff')
>> >>>> del f
>> > Also, I've tried that with both Jython and IronPython and they all work
>> > fine.
>> > Why does this cause problem to pypy ?  Do I have to avoid writing code
>> like
>> > this in the future ?
>> > _______________________________________________
>> > pypy-dev at codespeak.net
>> > http://codespeak.net/mailman/listinfo/pypy-dev
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100819/dd415df9/attachment.html>

From alex.gaynor at gmail.com  Thu Aug 19 07:09:25 2010
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Thu, 19 Aug 2010 00:09:25 -0500
Subject: [pypy-dev]
	=?utf-8?b?V2hhdCdzIHdyb25nIHdpdGggPj4+IG9wZW4o4oCZeHh4?=
	=?utf-8?b?4oCZLCDigJl34oCZKS53cml0ZSjigJlzdHVmZuKAmSkgPw==?=
In-Reply-To: <AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
	<AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>
	<AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>
Message-ID: <AANLkTin1gjgKQQNTjTtjtjqURkpCGzf5FvwNNUz7V5K5@mail.gmail.com>

On Thu, Aug 19, 2010 at 12:07 AM, sakesun roykiatisak <sakesun at gmail.com> wrote:
>
> A little problem is that, "with" statement is yet to work in pypy.
> :)
>
> On Thu, Aug 19, 2010 at 11:49 AM, sakesun roykiatisak <sakesun at gmail.com>
> wrote:
>>
>> That's make sense. ?I've tried on both IronPython and Jython with:
>> ipy -c "open(?xxx?, ?w?).write(?stuff?)"
>> jython -c "open(?xxx?, ?w?).write(?stuff?)"
>> When the interpreter terminate the file is closed. That's why it didn't
>> cause any problem.
>> Perhaps, I should always use "with" statement from now on.
>> >>> with open('xxx', 'w') as f: f.write('stuff')
>> Thanks
>>
>> On Thu, Aug 19, 2010 at 11:40 AM, Aaron DeVore <aaron.devore at gmail.com>
>> wrote:
>>>
>>> If I understand correctly, PyPy will garbage collect (and close) the
>>> file object at an indeterminate time. That time could be as long as
>>> until the program exits. Because CPython uses reference counting, it
>>> closes the file immediately after the file object goes out of scope.
>>>
>>> Of course, I may be entirely wrong.
>>>
>>> -Aaron DeVore
>>>
>>> On Wed, Aug 18, 2010 at 9:25 PM, sakesun roykiatisak <sakesun at gmail.com>
>>> wrote:
>>> > Hi,
>>> > ?I encountered this quite a few times when learning pypy from internet
>>> > resources:
>>> > ??the code like this
>>> >>>> open(?xxx?, ?w?).write(?stuff?)
>>> > This code is not working on pypy because it rely on CPython refcounting
>>> > behaviour.
>>> > I don't get it. Why ? ?I thought the code should be similar to storing
>>> > the
>>> > file object in temporary variable like this
>>> >>>> f = open('xxx', 'w')
>>> >>>> f.write('stuff')
>>> >>>> del f
>>> > Also, I've tried that with both Jython and IronPython and they all work
>>> > fine.
>>> > Why does this cause problem to pypy ? ?Do I have to avoid writing code
>>> > like
>>> > this in the future ?
>>> > _______________________________________________
>>> > pypy-dev at codespeak.net
>>> > http://codespeak.net/mailman/listinfo/pypy-dev
>>> >
>>
>
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

Since PyPy implements Python 2.5 at present you'll need to use `from
__future__ import with_statement` to ues it.

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me


From sakesun at gmail.com  Thu Aug 19 07:12:46 2010
From: sakesun at gmail.com (sakesun roykiatisak)
Date: Thu, 19 Aug 2010 12:12:46 +0700
Subject: [pypy-dev]
	=?windows-1252?q?What=27s_wrong_with_=3E=3E=3E_open=28?=
	=?windows-1252?q?=92xxx=92=2C_=92w=92=29=2Ewrite=28=92stuff=92=29_?=
	=?windows-1252?q?=3F?=
In-Reply-To: <AANLkTin1gjgKQQNTjTtjtjqURkpCGzf5FvwNNUz7V5K5@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
	<AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>
	<AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>
	<AANLkTin1gjgKQQNTjTtjtjqURkpCGzf5FvwNNUz7V5K5@mail.gmail.com>
Message-ID: <AANLkTikn-R4Vt7H6rn8yg1GLBDXNPoMnwywhdg+m5Wpm@mail.gmail.com>

Wow, thanks.  Pypy is a really precise implementation.


> Since PyPy implements Python 2.5 at present you'll need to use `from
> __future__ import with_statement` to ues it.
>
> Alex
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100819/9f2c54fa/attachment.html>

From william.leslie.ttg at gmail.com  Thu Aug 19 07:13:36 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 19 Aug 2010 15:13:36 +1000
Subject: [pypy-dev]
	=?windows-1252?q?What=27s_wrong_with_=3E=3E=3E_open=28?=
	=?windows-1252?q?=92xxx=92=2C_=92w=92=29=2Ewrite=28=92stuff=92=29_?=
	=?windows-1252?q?=3F?=
In-Reply-To: <AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
	<AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>
	<AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>
Message-ID: <AANLkTineWtapWu45vyCQcPEeLz+UHn8pBESg0h+DZo2i@mail.gmail.com>

A good resource I recently read on this is this entry in Raymond Chen's blog:

http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx

Together with the following entry, which explains why the lifetime of
the variable has nothing to do with the lifetime of the object, this
should help you understand.

You should consider automatically closing a file to be an
implementation detail, even cpython may not respect such semantics in
future. That is why the with statement was created.

-- 
William Leslie


From sakesun at gmail.com  Thu Aug 19 07:20:35 2010
From: sakesun at gmail.com (sakesun roykiatisak)
Date: Thu, 19 Aug 2010 12:20:35 +0700
Subject: [pypy-dev]
	=?windows-1252?q?What=27s_wrong_with_=3E=3E=3E_open=28?=
	=?windows-1252?q?=92xxx=92=2C_=92w=92=29=2Ewrite=28=92stuff=92=29_?=
	=?windows-1252?q?=3F?=
In-Reply-To: <AANLkTineWtapWu45vyCQcPEeLz+UHn8pBESg0h+DZo2i@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<AANLkTimDdwBgEqykKWJ6FghKHy6aukvnUcZ4wRk7Dqd5@mail.gmail.com>
	<AANLkTi=2D2DaToV8aoF3vu1rnfkJc+XJjomvEVXn5C6U@mail.gmail.com>
	<AANLkTimfREE5=NSbWzHDSkpQzj6jZXjaUPQLT8=H02wM@mail.gmail.com>
	<AANLkTineWtapWu45vyCQcPEeLz+UHn8pBESg0h+DZo2i@mail.gmail.com>
Message-ID: <AANLkTinDk8eHQWRjg3G+McTO8TZ_n9b7exA4_cUe0X0H@mail.gmail.com>

Thanks.

Interestingly, this is not the first time I was suggested to
pursue further reading with Raymond Chen's blog.

http://www.mail-archive.com/users at lists.ironpython.com/msg05792.html
<http://www.mail-archive.com/users at lists.ironpython.com/msg05792.html>
:)

On Thu, Aug 19, 2010 at 12:13 PM, William Leslie <
william.leslie.ttg at gmail.com> wrote:

> A good resource I recently read on this is this entry in Raymond Chen's
> blog:
>
> http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx
>
> Together with the following entry, which explains why the lifetime of
> the variable has nothing to do with the lifetime of the object, this
> should help you understand.
>
> You should consider automatically closing a file to be an
> implementation detail, even cpython may not respect such semantics in
> future. That is why the with statement was created.
>
> --
> William Leslie
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100819/6a2c2a40/attachment.html>

From p.giarrusso at gmail.com  Thu Aug 19 08:48:05 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Thu, 19 Aug 2010 08:48:05 +0200
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
References: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
Message-ID: <AANLkTin_cLowTUDcn6ee2joRqTuphDvE1gVDvjsLLgXT@mail.gmail.com>

On Thu, Aug 19, 2010 at 03:25, Hart's Antler <bhartsho at yahoo.com> wrote:
> I am starting to learn how to use the JIT, and i'm confused why my function gets slower over time, twice as slow after running for a few minutes. ?Using a virtualizable did speed up my code, but it still has the degrading performance problem. ?I have yesterdays SVN and using 64bit with boehm. ?I understand boehm is slower, but overall my JIT'ed function is many times slower than un-jitted, is this expected behavior from boehm?
>
> code is here:
> http://pastebin.com/9VGJHpNa

I think this has nothing to do with Boehm.

Is it swapping? If yes, that explains the slowdown.
Is memory usage growing over time? I expect yes, and it's a
misbehavior which could be explained by my analysis below.
Is it JITting code? I think no, or not to an advantage, but that's a
more complicated guess.
BTW, when debugging such things, _always_ ask and answer these
questions yourself.

Moreover, I'm not sure you need to use the JIT yourself.
- Your code is RPython, so you could as well just translate it without
JIT annotations, and it will be compiled to C code.
- Otherwise, you could write that as a app-level function, i.e. in
normal Python, and pass it to a translated PyPy-JIT interpreter. Did
you try and benchmark the code?
Can I ask you why you did not write that as a app-level function, i.e.
as normal Python code, to use PyPy's JIT directly, without needing
detailed understanding of the JIT?
It would be interesting to see a comparison (and have it on the web,
after some code review).

Especially, I'm not sure that as currently written you're getting any
speedup, and I seriously wonder whether the JIT could give an
additional speedup over RPython here (the regexp interpreter is a
completely different case, since it compiles a regexp, but why do you
compile an array?).
I think just raw CPython can be 340x slower than C (I assume NumPy
uses C), and since your code is RPython, there must be something basic
wrong.

I think you have too many green variables in your code:
"At runtime, for a given value of the green variables, one piece of
machine code will be generated. This piece of machine code can
therefore assume that the value of the green variable is constant."
[1]

So, every time you change the value of a green variable, the JIT will
have to recompile again the function. Note that actually, I think, for
each new value of the variable, first a given number of iterations
have to occur (1000? 10 000? I'm not sure), then the JIT will spend
time creating a trace and compiling it. The length of the involved
arrays is maybe around the threshold, maybe smaller, so you get "all
pain, and no gain".

>From your code:
complex_dft_jitdriver = JitDriver(
        greens = 'index length accum array'.split(),
        reds = 'k a b J'.split(),
        virtualizables = 'a'.split()
        #can_inline=True
)

The only acceptable green variable are IMHO array and length there,
because in the calling code, the other change for each invocation I
think.
I also think that only length should be green (and that could give a
speedup), and that marking array as green gives neglibible or no
speedup.
Marking length as green allows specializing the function on the size
of the array - something one would not do in C probably, but that one
could do in C++. Whether it is worth it depends on the specific code &
optimizations available - I think here the speedup should be small.

Best regards

[1] http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From fijall at gmail.com  Thu Aug 19 12:03:17 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 19 Aug 2010 12:03:17 +0200
Subject: [pypy-dev]
	=?utf-8?b?V2hhdCdzIHdyb25nIHdpdGggPj4+IG9wZW4o4oCZeHh4?=
	=?utf-8?b?4oCZLCDigJl34oCZKS53cml0ZSjigJlzdHVmZuKAmSkgPw==?=
In-Reply-To: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
Message-ID: <AANLkTimSSG5P1EnkyH1FkAdZ7ABfdiscjRuNmpgDBJZS@mail.gmail.com>

Hi.

Yes, those two things are equivalent and they both work. However, if
you try to read the file immediately after deleting the variable,
you'll find out that the file is empty on any implementation but
cpython.

On Thu, Aug 19, 2010 at 6:25 AM, sakesun roykiatisak <sakesun at gmail.com> wrote:
> Hi,
> ?I encountered this quite a few times when learning pypy from internet
> resources:
> ??the code like this
>>>> open(?xxx?, ?w?).write(?stuff?)
> This code is not working on pypy because it rely on CPython refcounting
> behaviour.
> I don't get it. Why ? ?I thought the code should be similar to storing the
> file object in temporary variable like this
>>>> f = open('xxx', 'w')
>>>> f.write('stuff')
>>>> del f
> Also, I've tried that with both Jython and IronPython and they all work
> fine.
> Why does this cause problem to pypy ? ?Do I have to avoid writing code like
> this in the future ?
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


From fijall at gmail.com  Thu Aug 19 12:11:29 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 19 Aug 2010 12:11:29 +0200
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <AANLkTin_cLowTUDcn6ee2joRqTuphDvE1gVDvjsLLgXT@mail.gmail.com>
References: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
	<AANLkTin_cLowTUDcn6ee2joRqTuphDvE1gVDvjsLLgXT@mail.gmail.com>
Message-ID: <AANLkTikgregsQrVufWXa4f4tHWbo7je3Quh=Wki6z-od@mail.gmail.com>

Hi

On Thu, Aug 19, 2010 at 8:48 AM, Paolo Giarrusso <p.giarrusso at gmail.com> wrote:
> On Thu, Aug 19, 2010 at 03:25, Hart's Antler <bhartsho at yahoo.com> wrote:
>> I am starting to learn how to use the JIT, and i'm confused why my function gets slower over time, twice as slow after running for a few minutes. ?Using a virtualizable did speed up my code, but it still has the degrading performance problem. ?I have yesterdays SVN and using 64bit with boehm. ?I understand boehm is slower, but overall my JIT'ed function is many times slower than un-jitted, is this expected behavior from boehm?
>>
>> code is here:
>> http://pastebin.com/9VGJHpNa
>
> I think this has nothing to do with Boehm.

I don't think as well

<snip>

> Moreover, I'm not sure you need to use the JIT yourself.
> - Your code is RPython, so you could as well just translate it without
> JIT annotations, and it will be compiled to C code.
> - Otherwise, you could write that as a app-level function, i.e. in
> normal Python, and pass it to a translated PyPy-JIT interpreter. Did
> you try and benchmark the code?
> Can I ask you why you did not write that as a app-level function, i.e.
> as normal Python code, to use PyPy's JIT directly, without needing
> detailed understanding of the JIT?
> It would be interesting to see a comparison (and have it on the web,
> after some code review).

JIT can essentially speed up based on constant folding based on
bytecode. Bytecode should be the only green variable here and all
others (that you don't want to specialize over) should be red and not
promoted. In your case it's very likely you compile new loop very
often (overspecialization).

>
> Especially, I'm not sure that as currently written you're getting any
> speedup, and I seriously wonder whether the JIT could give an
> additional speedup over RPython here (the regexp interpreter is a
> completely different case, since it compiles a regexp, but why do you
> compile an array?).

That's silly, our python interpreter is an RPython program. Anything
that can have a meaningfully defined "bytecode" or a "compile time
constant" can be sped up by the JIT. For example a templating
language.

> I think just raw CPython can be 340x slower than C (I assume NumPy
> uses C)

You should check more and have less assumptions.

>
> So, every time you change the value of a green variable, the JIT will
> have to recompile again the function. Note that actually, I think, for
> each new value of the variable, first a given number of iterations
> have to occur (1000? 10 000? I'm not sure), then the JIT will spend
> time creating a trace and compiling it. The length of the involved
> arrays is maybe around the threshold, maybe smaller, so you get "all
> pain, and no gain".
>

to be precise for each combination of green variables there has to be
a 1000 (by default) iterations. If there is no such thing, you'll
never compile code and simply spend time bookkeeping.

Cheers,
fijal


From p.giarrusso at gmail.com  Thu Aug 19 13:34:12 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Thu, 19 Aug 2010 13:34:12 +0200
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <AANLkTikgregsQrVufWXa4f4tHWbo7je3Quh=Wki6z-od@mail.gmail.com>
References: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
	<AANLkTin_cLowTUDcn6ee2joRqTuphDvE1gVDvjsLLgXT@mail.gmail.com>
	<AANLkTikgregsQrVufWXa4f4tHWbo7je3Quh=Wki6z-od@mail.gmail.com>
Message-ID: <AANLkTin2nFzyRT_VrrA99BmM-zJz2gRSPfBWwdRhav5P@mail.gmail.com>

Hi Maciej,
I think you totally misunderstood me, possibly because I was not
clear, see below. In short, I was wondering whether the approach of
the original code made any sense, and my guess was "mostly not",
exactly because there is little constant folding possible in the code,
as it is written.

[Hart, I don't think that any O(N^2) implementation of DFT (what is in
the code), i.e. two nested for loops, should be written to explicitly
take advantage of the JIT. I don't know about the FFT algorithm, but a
few vague ideas say "yes", because constant folding the length could
_maybe_ allow constant folding the permutations applied to data in the
Cooley?Tukey FFT algorithm.]

On Thu, Aug 19, 2010 at 12:11, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Hi
<snip>

>> Moreover, I'm not sure you need to use the JIT yourself.
>> - Your code is RPython, so you could as well just translate it without
>> JIT annotations, and it will be compiled to C code.
>> - Otherwise, you could write that as a app-level function, i.e. in
>> normal Python, and pass it to a translated PyPy-JIT interpreter. Did
>> you try and benchmark the code?
>> Can I ask you why you did not write that as a app-level function, i.e.
>> as normal Python code, to use PyPy's JIT directly, without needing
>> detailed understanding of the JIT?
>> It would be interesting to see a comparison (and have it on the web,
>> after some code review).
>
> JIT can essentially speed up based on constant folding based on
> bytecode. Bytecode should be the only green variable here and all
> others (that you don't want to specialize over) should be red and not
> promoted. In your case it's very likely you compile new loop very
> often (overspecialization).

I see no bytecode in the example - it's a DFT implementation.
For each combination of green variables, there are 1024 iterations,
and there are 1024 such combinations, so overspecialization is almost
guaranteed.

My next question, inspired from the specific code, is: is JITted code
ever thrown away, if too much is generated? Even for valid use cases,
most JITs can generate too much code, and they need then to choose
what to keep and what to throw away.

>> Especially, I'm not sure that as currently written you're getting any
>> speedup, and I seriously wonder whether the JIT could give an
>> additional speedup over RPython here (the regexp interpreter is a
>> completely different case, since it compiles a regexp, but why do you
>> compile an array?).
>
> That's silly, our python interpreter is an RPython program. Anything
> that can have a meaningfully defined "bytecode" or a "compile time
> constant" can be sped up by the JIT. For example a templating
> language.

You misunderstood me, I totally agree with you, and my understanding
is that in the given program (which I read almost fully) constant
folding makes little sense.
Since that program is written with RPython + JIT, but it has green
variables which are not at all "compile time constants", "I wonder
seriously" was meant as "I wonder seriously whether what you are
trying makes any sense". As I argued, the only constant folding
possible is for the array length. And again, I wonder whether it's
worth it, my guess tends towards "no", but a benchmark is needed
(there will be some improvement probably).

I was just a bit vaguer because I just studied docs on PyPy (and
papers about tracing compilation). But your answer confirms that my
original analysis is correct, and that I should write more clearly
maybe.

>> I think just raw CPython can be 340x slower than C (I assume NumPy
>> uses C)

> You should check more and have less assumptions.

I did some checks, on PyPy's blog actually, not definitive though, and
I stand by what I meant (see below). Without reading the pastie in
full, however, my comments are out of context.
I guess your tone is fine, since you thought I wrote nonsense. But in
general, I have yet to see a guideline forbidding "IIRC" and similar
ways of discussing (the above was an _educated_ guess), especially
when the writer remembers correctly (as in this case).
Having said that, I'm always happy to see counterexamples and learn
something, if they exist. In this case, for what I actually meant (and
wrote, IMHO), a counterexample would be a RPython or a JITted program
>= 340x slower than C.

For the speed ratio, the code pastie writes that RPython JITted code
is 340x slower than NumPy code, and I was writing that it's
unreasonable; in this case, it happens because of overspecialization
caused by misuse of the JIT.

For speed ratios among CPython, C, RPython, I was comparing to
http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html.
What I meant is that JITted code can't be so much slower than C.

For NumPy, I had read this:
http://morepypy.blogspot.com/2009/07/pypy-numeric-experiments.html,
and it mostly implies that NumPy is written in C (it actually says
"NumPy's C version", but I missed it). And for the specific discussed
microbenchmark, the performance gap between NumPy and CPython is
~100x.

Best regards
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From fijall at gmail.com  Thu Aug 19 13:55:00 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 19 Aug 2010 13:55:00 +0200
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <AANLkTin2nFzyRT_VrrA99BmM-zJz2gRSPfBWwdRhav5P@mail.gmail.com>
References: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
	<AANLkTin_cLowTUDcn6ee2joRqTuphDvE1gVDvjsLLgXT@mail.gmail.com>
	<AANLkTikgregsQrVufWXa4f4tHWbo7je3Quh=Wki6z-od@mail.gmail.com>
	<AANLkTin2nFzyRT_VrrA99BmM-zJz2gRSPfBWwdRhav5P@mail.gmail.com>
Message-ID: <AANLkTinQkeA2XfHeU7Osnz0PvZ3ZcY7OPp7eJt4o6vse@mail.gmail.com>

On Thu, Aug 19, 2010 at 1:34 PM, Paolo Giarrusso <p.giarrusso at gmail.com> wrote:
> Hi Maciej,
> I think you totally misunderstood me, possibly because I was not
> clear, see below. In short, I was wondering whether the approach of
> the original code made any sense, and my guess was "mostly not",
> exactly because there is little constant folding possible in the code,
> as it is written.

That's always possible :)

>
> [Hart, I don't think that any O(N^2) implementation of DFT (what is in
> the code), i.e. two nested for loops, should be written to explicitly
> take advantage of the JIT. I don't know about the FFT algorithm, but a
> few vague ideas say "yes", because constant folding the length could
> _maybe_ allow constant folding the permutations applied to data in the
> Cooley?Tukey FFT algorithm.]
>
> On Thu, Aug 19, 2010 at 12:11, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> Hi
> <snip>
>
>>> Moreover, I'm not sure you need to use the JIT yourself.
>>> - Your code is RPython, so you could as well just translate it without
>>> JIT annotations, and it will be compiled to C code.
>>> - Otherwise, you could write that as a app-level function, i.e. in
>>> normal Python, and pass it to a translated PyPy-JIT interpreter. Did
>>> you try and benchmark the code?
>>> Can I ask you why you did not write that as a app-level function, i.e.
>>> as normal Python code, to use PyPy's JIT directly, without needing
>>> detailed understanding of the JIT?
>>> It would be interesting to see a comparison (and have it on the web,
>>> after some code review).
>>
>> JIT can essentially speed up based on constant folding based on
>> bytecode. Bytecode should be the only green variable here and all
>> others (that you don't want to specialize over) should be red and not
>> promoted. In your case it's very likely you compile new loop very
>> often (overspecialization).
>
> I see no bytecode in the example - it's a DFT implementation.
> For each combination of green variables, there are 1024 iterations,
> and there are 1024 such combinations, so overspecialization is almost
> guaranteed.

Agreed.

>
> My next question, inspired from the specific code, is: is JITted code
> ever thrown away, if too much is generated? Even for valid use cases,
> most JITs can generate too much code, and they need then to choose
> what to keep and what to throw away.

No, as of now, never. In general in case of Python it would have to be
a heuristic anyway (since code objects are mostly immortal and you
can't decide whether certain combination of assumptions will occur in
the future or not). We have some ideas which code will never run any
more and besides that, we need to implement some heuristics when to
throw away code.

>
>>> Especially, I'm not sure that as currently written you're getting any
>>> speedup, and I seriously wonder whether the JIT could give an
>>> additional speedup over RPython here (the regexp interpreter is a
>>> completely different case, since it compiles a regexp, but why do you
>>> compile an array?).
>>
>> That's silly, our python interpreter is an RPython program. Anything
>> that can have a meaningfully defined "bytecode" or a "compile time
>> constant" can be sped up by the JIT. For example a templating
>> language.
>
> You misunderstood me, I totally agree with you, and my understanding
> is that in the given program (which I read almost fully) constant
> folding makes little sense.

Great :) I might have misunderstood you.

> Since that program is written with RPython + JIT, but it has green
> variables which are not at all "compile time constants", "I wonder
> seriously" was meant as "I wonder seriously whether what you are
> trying makes any sense". As I argued, the only constant folding
> possible is for the array length. And again, I wonder whether it's
> worth it, my guess tends towards "no", but a benchmark is needed
> (there will be some improvement probably).

I guess the answer is "hell no", simply because if you don't constant
fold our assembler would not be nearly as good as gcc's one (if
nothing else).

>
> I was just a bit vaguer because I just studied docs on PyPy (and
> papers about tracing compilation). But your answer confirms that my
> original analysis is correct, and that I should write more clearly
> maybe.
>
>>> I think just raw CPython can be 340x slower than C (I assume NumPy
>>> uses C)
>
>> You should check more and have less assumptions.
>
> I did some checks, on PyPy's blog actually, not definitive though, and
> I stand by what I meant (see below). Without reading the pastie in
> full, however, my comments are out of context.
> I guess your tone is fine, since you thought I wrote nonsense. But in
> general, I have yet to see a guideline forbidding "IIRC" and similar
> ways of discussing (the above was an _educated_ guess), especially
> when the writer remembers correctly (as in this case).
> Having said that, I'm always happy to see counterexamples and learn
> something, if they exist. In this case, for what I actually meant (and
> wrote, IMHO), a counterexample would be a RPython or a JITted program
>>= 340x slower than C.

My comment was merely about "numpy is written in C".

>
> For the speed ratio, the code pastie writes that RPython JITted code
> is 340x slower than NumPy code, and I was writing that it's
> unreasonable; in this case, it happens because of overspecialization
> caused by misuse of the JIT.

Yes.

>
> For speed ratios among CPython, C, RPython, I was comparing to
> http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html.
> What I meant is that JITted code can't be so much slower than C.
>
> For NumPy, I had read this:
> http://morepypy.blogspot.com/2009/07/pypy-numeric-experiments.html,
> and it mostly implies that NumPy is written in C (it actually says
> "NumPy's C version", but I missed it). And for the specific discussed
> microbenchmark, the performance gap between NumPy and CPython is
> ~100x.

Yes, there is a slight difference :-) numpy is written mostly in C (at
least glue code), but a lot of algorithms call back to some other
stuff (depending what you have installed) which as far as I'm
concerned might be whatever (most likely fortran or SSE assembler at
some level.)

>
> Best regards
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
>


From bhartsho at yahoo.com  Fri Aug 20 08:04:48 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Thu, 19 Aug 2010 23:04:48 -0700 (PDT)
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <AANLkTinQkeA2XfHeU7Osnz0PvZ3ZcY7OPp7eJt4o6vse@mail.gmail.com>
Message-ID: <23133.28490.qm@web114006.mail.gq1.yahoo.com>

Hi Paolo,

thanks for your in-depth response, i tried your suggestions and noticed a big speed improvement with no more degrading performance, i didn't realize having more green is bad.  However it still runs 4x slower than just plain old compiled RPython, i checked if the JIT was really running, and your right its not actually using any JIT'ed code, it only traces and then aborts, though now i can not figure out why it aborts after trying several things.

I didn't write this as an app-level function because i wanted to understand how the JIT works on a deeper level and with RPython.  I had seen the blog post before by Carl Friedrich Bolz about JIT'ing and that he was able to speed things up 22x faster than plain RPython translated to C, so that got me curious about the JIT.  Now i understand that that was an exceptional case, but what other cases might RPython+JIT be useful?  And its good to see here what if any speed up there will be in the worst case senairo.

Sorry about all the confusion about numpy being 340x faster, i should have added in that note that i compared numpy fast fourier transform to Rpython direct fourier transform, and direct is known to be hundreds of times slower.  (numpy lacks a DFT to compare to)

updated code with only the length as green: http://pastebin.com/DnJikXze

The jitted function now checks jit.we_are_jitted(), and prints 'unjitted' if there is no jitting.
abort: trace too long seems to happen every trace, so we_are_jitted() is never true, and the 4x overhead compared to compiled RPython is then understandable.

trace_limit is set to its maximum, so why is it aborting?  Here is my settings:
	jitdriver.set_param('threshold', 4)
	jitdriver.set_param('trace_eagerness', 4)
	jitdriver.set_param('trace_limit', sys.maxint)
	jitdriver.set_param('debug', 3)


Tracing:      	80	1.019871
Backend:      	0	0.000000
Running asm:     	0
Blackhole:       	80
TOTAL:      		16.785704
ops:             	1456160
recorded ops:    	1200000
  calls:         	99080
guards:          	430120
opt ops:         	0
opt guards:      	0
forcings:        	0
abort: trace too long:	80
abort: compiling:	0
abort: vable escape:	0
nvirtuals:       	0
nvholes:         	0
nvreused:        	0


--- On Thu, 8/19/10, Maciej Fijalkowski <fijall at gmail.com> wrote:

> From: Maciej Fijalkowski <fijall at gmail.com>
> Subject: Re: [pypy-dev] JIT'ed function performance degrades
> To: "Paolo Giarrusso" <p.giarrusso at gmail.com>
> Cc: "Hart's Antler" <bhartsho at yahoo.com>, pypy-dev at codespeak.net
> Date: Thursday, 19 August, 2010, 4:55 AM
> On Thu, Aug 19, 2010 at 1:34 PM,
> Paolo Giarrusso <p.giarrusso at gmail.com>
> wrote:
> > Hi Maciej,
> > I think you totally misunderstood me, possibly because
> I was not
> > clear, see below. In short, I was wondering whether
> the approach of
> > the original code made any sense, and my guess was
> "mostly not",
> > exactly because there is little constant folding
> possible in the code,
> > as it is written.
> 
> That's always possible :)
> 
> >
> > [Hart, I don't think that any O(N^2) implementation of
> DFT (what is in
> > the code), i.e. two nested for loops, should be
> written to explicitly
> > take advantage of the JIT. I don't know about the FFT
> algorithm, but a
> > few vague ideas say "yes", because constant folding
> the length could
> > _maybe_ allow constant folding the permutations
> applied to data in the
> > Cooley?Tukey FFT algorithm.]
> >
> > On Thu, Aug 19, 2010 at 12:11, Maciej Fijalkowski
> <fijall at gmail.com>
> wrote:
> >> Hi
> > <snip>
> >
> >>> Moreover, I'm not sure you need to use the JIT
> yourself.
> >>> - Your code is RPython, so you could as well
> just translate it without
> >>> JIT annotations, and it will be compiled to C
> code.
> >>> - Otherwise, you could write that as a
> app-level function, i.e. in
> >>> normal Python, and pass it to a translated
> PyPy-JIT interpreter. Did
> >>> you try and benchmark the code?
> >>> Can I ask you why you did not write that as a
> app-level function, i.e.
> >>> as normal Python code, to use PyPy's JIT
> directly, without needing
> >>> detailed understanding of the JIT?
> >>> It would be interesting to see a comparison
> (and have it on the web,
> >>> after some code review).
> >>
> >> JIT can essentially speed up based on constant
> folding based on
> >> bytecode. Bytecode should be the only green
> variable here and all
> >> others (that you don't want to specialize over)
> should be red and not
> >> promoted. In your case it's very likely you
> compile new loop very
> >> often (overspecialization).
> >
> > I see no bytecode in the example - it's a DFT
> implementation.
> > For each combination of green variables, there are
> 1024 iterations,
> > and there are 1024 such combinations, so
> overspecialization is almost
> > guaranteed.
> 
> Agreed.
> 
> >
> > My next question, inspired from the specific code, is:
> is JITted code
> > ever thrown away, if too much is generated? Even for
> valid use cases,
> > most JITs can generate too much code, and they need
> then to choose
> > what to keep and what to throw away.
> 
> No, as of now, never. In general in case of Python it would
> have to be
> a heuristic anyway (since code objects are mostly immortal
> and you
> can't decide whether certain combination of assumptions
> will occur in
> the future or not). We have some ideas which code will
> never run any
> more and besides that, we need to implement some heuristics
> when to
> throw away code.
> 
> >
> >>> Especially, I'm not sure that as currently
> written you're getting any
> >>> speedup, and I seriously wonder whether the
> JIT could give an
> >>> additional speedup over RPython here (the
> regexp interpreter is a
> >>> completely different case, since it compiles a
> regexp, but why do you
> >>> compile an array?).
> >>
> >> That's silly, our python interpreter is an RPython
> program. Anything
> >> that can have a meaningfully defined "bytecode" or
> a "compile time
> >> constant" can be sped up by the JIT. For example a
> templating
> >> language.
> >
> > You misunderstood me, I totally agree with you, and my
> understanding
> > is that in the given program (which I read almost
> fully) constant
> > folding makes little sense.
> 
> Great :) I might have misunderstood you.
> 
> > Since that program is written with RPython + JIT, but
> it has green
> > variables which are not at all "compile time
> constants", "I wonder
> > seriously" was meant as "I wonder seriously whether
> what you are
> > trying makes any sense". As I argued, the only
> constant folding
> > possible is for the array length. And again, I wonder
> whether it's
> > worth it, my guess tends towards "no", but a benchmark
> is needed
> > (there will be some improvement probably).
> 
> I guess the answer is "hell no", simply because if you
> don't constant
> fold our assembler would not be nearly as good as gcc's one
> (if
> nothing else).
> 
> >
> > I was just a bit vaguer because I just studied docs on
> PyPy (and
> > papers about tracing compilation). But your answer
> confirms that my
> > original analysis is correct, and that I should write
> more clearly
> > maybe.
> >
> >>> I think just raw CPython can be 340x slower
> than C (I assume NumPy
> >>> uses C)
> >
> >> You should check more and have less assumptions.
> >
> > I did some checks, on PyPy's blog actually, not
> definitive though, and
> > I stand by what I meant (see below). Without reading
> the pastie in
> > full, however, my comments are out of context.
> > I guess your tone is fine, since you thought I wrote
> nonsense. But in
> > general, I have yet to see a guideline forbidding
> "IIRC" and similar
> > ways of discussing (the above was an _educated_
> guess), especially
> > when the writer remembers correctly (as in this
> case).
> > Having said that, I'm always happy to see
> counterexamples and learn
> > something, if they exist. In this case, for what I
> actually meant (and
> > wrote, IMHO), a counterexample would be a RPython or a
> JITted program
> >>= 340x slower than C.
> 
> My comment was merely about "numpy is written in C".
> 
> >
> > For the speed ratio, the code pastie writes that
> RPython JITted code
> > is 340x slower than NumPy code, and I was writing that
> it's
> > unreasonable; in this case, it happens because of
> overspecialization
> > caused by misuse of the JIT.
> 
> Yes.
> 
> >
> > For speed ratios among CPython, C, RPython, I was
> comparing to
> > http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html.
> > What I meant is that JITted code can't be so much
> slower than C.
> >
> > For NumPy, I had read this:
> > http://morepypy.blogspot.com/2009/07/pypy-numeric-experiments.html,
> > and it mostly implies that NumPy is written in C (it
> actually says
> > "NumPy's C version", but I missed it). And for the
> specific discussed
> > microbenchmark, the performance gap between NumPy and
> CPython is
> > ~100x.
> 
> Yes, there is a slight difference :-) numpy is written
> mostly in C (at
> least glue code), but a lot of algorithms call back to some
> other
> stuff (depending what you have installed) which as far as
> I'm
> concerned might be whatever (most likely fortran or SSE
> assembler at
> some level.)
> 
> >
> > Best regards
> > --
> > Paolo Giarrusso - Ph.D. Student
> > http://www.informatik.uni-marburg.de/~pgiarrusso/
> >
> 


From timon.elviejo at gmail.com  Fri Aug 20 09:58:07 2010
From: timon.elviejo at gmail.com (=?ISO-8859-1?Q?Jorge_Tim=F3n?=)
Date: Fri, 20 Aug 2010 09:58:07 +0200
Subject: [pypy-dev] gpgpu and pypy
Message-ID: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>

Hi, I'm just curious about the feasibility of running python code in a gpu
by extending pypy.
I don't have the time (and probably the knowledge neither) to develop that
pypy extension, but I just want to know if it's possible.
I'm interested in languages like openCL and nvidia's CUDA because I think
the future of supercomputing is going to be GPGPU. There's people working in
bringing GPGPU to python:

http://mathema.tician.de/software/pyopencl
http://mathema.tician.de/software/pycuda

Would it be possible to run python code in parallel without the need (for
the developer) of actively parallelizing the code?
I'm not talking about code of hard concurrency, but of code with intrinsic
parallelism (let's say matrix multiplication).
Would a JIT compilation be capable of detecting parallelism?
Would it be interesting or that's a job we must leave to humans by now?
What do you think?

I don't know if I had explain myself because English is not my first
language.

Cheers,
Jorge Tim?n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100820/0b6f7cf7/attachment.html>

From william.leslie.ttg at gmail.com  Fri Aug 20 10:05:50 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Fri, 20 Aug 2010 18:05:50 +1000
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <23133.28490.qm@web114006.mail.gq1.yahoo.com>
References: <AANLkTinQkeA2XfHeU7Osnz0PvZ3ZcY7OPp7eJt4o6vse@mail.gmail.com>
	<23133.28490.qm@web114006.mail.gq1.yahoo.com>
Message-ID: <AANLkTi=ayajkTPO4Dc65s0v_O_H8+HuA-2atafMDj4Dy@mail.gmail.com>

On 20 August 2010 16:04, Hart's Antler <bhartsho at yahoo.com> wrote:
> Hi Paolo,
>
> thanks for your in-depth response, i tried your suggestions and noticed a big speed improvement with no more degrading performance, i didn't realize having more green is bad. ?However it still runs 4x slower than just plain old compiled RPython, i checked if the JIT was really running, and your right its not actually using any JIT'ed code, it only traces and then aborts, though now i can not figure out why it aborts after trying several things.
>
> I didn't write this as an app-level function because i wanted to understand how the JIT works on a deeper level and with RPython. ?I had seen the blog post before by Carl Friedrich Bolz about JIT'ing and that he was able to speed things up 22x faster than plain RPython translated to C, so that got me curious about the JIT. ?Now i understand that that was an exceptional case, but what other cases might RPython+JIT be useful? ?And its good to see here what if any speed up there will be in the worst case senairo.
>
> Sorry about all the confusion about numpy being 340x faster, i should have added in that note that i compared numpy fast fourier transform to Rpython direct fourier transform, and direct is known to be hundreds of times slower. ?(numpy lacks a DFT to compare to)
>
> updated code with only the length as green: http://pastebin.com/DnJikXze
>
> The jitted function now checks jit.we_are_jitted(), and prints 'unjitted' if there is no jitting.
> abort: trace too long seems to happen every trace, so we_are_jitted() is never true, and the 4x overhead compared to compiled RPython is then understandable.
>
> trace_limit is set to its maximum, so why is it aborting? ?Here is my settings:
> ? ? ? ?jitdriver.set_param('threshold', 4)
> ? ? ? ?jitdriver.set_param('trace_eagerness', 4)
> ? ? ? ?jitdriver.set_param('trace_limit', sys.maxint)
> ? ? ? ?jitdriver.set_param('debug', 3)
>
>
> Tracing: ? ? ? ?80 ? ? ?1.019871
> Backend: ? ? ? ?0 ? ? ? 0.000000
> Running asm: ? ? ? ? ? ?0
> Blackhole: ? ? ? ? ? ? ?80
> TOTAL: ? ? ? ? ? ? ? ? ?16.785704
> ops: ? ? ? ? ? ? ? ? ? ?1456160
> recorded ops: ? ? ? ? ? 1200000
> ?calls: ? ? ? ? ? ? ? ?99080
> guards: ? ? ? ? ? ? ? ? 430120
> opt ops: ? ? ? ? ? ? ? ?0
> opt guards: ? ? ? ? ? ? 0
> forcings: ? ? ? ? ? ? ? 0
> abort: trace too long: ?80
> abort: compiling: ? ? ? 0
> abort: vable escape: ? ?0
> nvirtuals: ? ? ? ? ? ? ?0
> nvholes: ? ? ? ? ? ? ? ?0
> nvreused: ? ? ? ? ? ? ? 0

This application probably isn't a very good use for the jit because it
has very little control flow. It may unroll the loop, but you're
probably not gaining anything there. As long as the methods get
inlined (as there is no polymorphic dispatch here that I can see), jit
can't improve on this much. What optimisations do you expect it to
make?

-- 
William Leslie


From arigo at tunes.org  Fri Aug 20 11:31:58 2010
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 20 Aug 2010 11:31:58 +0200
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
References: <786138.15701.qm@web114011.mail.gq1.yahoo.com>
Message-ID: <20100820093158.GA16244@code0.codespeak.net>

Hi Hart,

On Wed, Aug 18, 2010 at 06:25:02PM -0700, Hart's Antler wrote:
> I am starting to learn how to use the JIT, and i'm confused why my
> function gets slower over time, twice as slow after running for a few
> minutes.  Using a virtualizable did speed up my code, but it still has
> the degrading performance problem.  I have yesterdays SVN and using
> 64bit with boehm.  I understand boehm is slower, but overall my JIT'ed
> function is many times slower than un-jitted, is this expected
> behavior from boehm?

It seems that there are still issues with the 64-bit JIT -- it could be
something along the line of "the guards are not correctly overwritten",
or likely something more subtle along these lines, causing more and more
assembler to be produced.  We have observed "infinite"-looking memory
usage for long-running programs, too.

Note that in the example you posted, you are doing the common mistake of
putting some code (the looping condition) between can_enter_jit and
jit_merge_point.  We should really do something about checking that
people don't do that.  It mostly works, except in some cases where it
doesn't :-(  The issue is more precisely:

    while x < y:
        my_jit_driver.jit_merge_point(...)
        ...loop body...
        my_jit_driver.can_enter_jit(...)

In this case, the "x < y" is evaluated between can_enter_jit and
jit_merge_point, and that's the mistake.  You should rewrite your
examples as:

    while x < y:
        my_jit_driver.can_enter_jit(...)
        my_jit_driver.jit_merge_point(...)
        ...loop body...


A bientot,

Armin.


From arigo at tunes.org  Fri Aug 20 11:45:24 2010
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 20 Aug 2010 11:45:24 +0200
Subject: [pypy-dev] JIT'ed function performance degrades
In-Reply-To: <23133.28490.qm@web114006.mail.gq1.yahoo.com>
References: <AANLkTinQkeA2XfHeU7Osnz0PvZ3ZcY7OPp7eJt4o6vse@mail.gmail.com>
	<23133.28490.qm@web114006.mail.gq1.yahoo.com>
Message-ID: <20100820094524.GB16244@code0.codespeak.net>

Hi Hart,

On Thu, Aug 19, 2010 at 11:04:48PM -0700, Hart's Antler wrote:
> I had seen the blog post before by Carl Friedrich Bolz about JIT'ing
> and that he was able to speed things up 22x faster than plain RPython
> translated to C, so that got me curious about the JIT.

You cannot expect any program to get 22x faster with RPython+JIT than it
is with just RPython.  That would be like saying that any C program can
get 22x faster if we apply some special JIT on it.  For a general C
program, such a statement makes no sense -- no JIT can help.

The PyPy JIT can help *only* if the RPython program in question is some
kind of interpreter, with a loose definition of interpreter.  That's why
we can apply the PyPy JIT to the Python interpreter written in RPython;
or to some other examples, like Carl Friedrich's blog post about the
regular expressions "interpreter".


A bientot,

Armin.


From arigo at tunes.org  Fri Aug 20 11:57:21 2010
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 20 Aug 2010 11:57:21 +0200
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
Message-ID: <20100820095721.GC16244@code0.codespeak.net>

Hi Sakesun,

On Thu, Aug 19, 2010 at 11:25:42AM +0700, sakesun roykiatisak wrote:
> >>> f = open('xxx', 'w')
> >>> f.write('stuff')
> >>> del f
> 
> Also, I've tried that with both Jython and IronPython and they all work
> fine.

I guess that you didn't try exactly the same thing.  If I do:

    arigo at tannit ~ $ jython                 
    Jython 2.2.1 on java1.6.0_20
    Type "copyright", "credits" or "license" for more information.
    >>> open('x', 'w').write('hello')
    >>> 

Then "cat x" in another terminal shows an empty file.  The file "x" is
only filled when I exit Jython.  It is exactly the same behavior as I
get on PyPy.  Maybe I missed something, and there is a different way to
do things such that it works on Jython but not on PyPy; if so, can you
describe it more precisely?  Thanks!


A bientot,

Armin.


From donny.viszneki at gmail.com  Fri Aug 20 12:23:26 2010
From: donny.viszneki at gmail.com (Donny Viszneki)
Date: Fri, 20 Aug 2010 06:23:26 -0400
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <20100820095721.GC16244@code0.codespeak.net>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
Message-ID: <AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>

Armin: Sakesun used "del f" and it appears you did not. In Python
IIRC, an explicit call to del should kick off the finalizer to flush
and close the file!

open('x', 'w').write('hello') alone does not imply the file instance
(return value of open()) has been finalized because the garbage
collector may not have hit it yet.

Jython and IronPython are pretty much guaranteed to behave differently
under a wide variety of circumstances when it comes to the garbage
collector. Do not rely on the garbage collector for program semantics!

Because Sakesun has used "del f" it should be quite a concern that the
file has not been finalized properly!

On Fri, Aug 20, 2010 at 5:57 AM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Sakesun,
>
> On Thu, Aug 19, 2010 at 11:25:42AM +0700, sakesun roykiatisak wrote:
>> >>> f = open('xxx', 'w')
>> >>> f.write('stuff')
>> >>> del f
>>
>> Also, I've tried that with both Jython and IronPython and they all work
>> fine.
>
> I guess that you didn't try exactly the same thing. ?If I do:
>
> ? ?arigo at tannit ~ $ jython
> ? ?Jython 2.2.1 on java1.6.0_20
> ? ?Type "copyright", "credits" or "license" for more information.
> ? ?>>> open('x', 'w').write('hello')
> ? ?>>>
>
> Then "cat x" in another terminal shows an empty file. ?The file "x" is
> only filled when I exit Jython. ?It is exactly the same behavior as I
> get on PyPy. ?Maybe I missed something, and there is a different way to
> do things such that it works on Jython but not on PyPy; if so, can you
> describe it more precisely? ?Thanks!
>
>
> A bientot,
>
> Armin.
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


-- 
http://codebad.com/


From william.leslie.ttg at gmail.com  Fri Aug 20 12:32:34 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Fri, 20 Aug 2010 20:32:34 +1000
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
	<AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
Message-ID: <AANLkTin3pTeF19GmypVY=3gL+dEFLdPx8BOUUgX9-o7A@mail.gmail.com>

It seems you too have missed the difference between deleting some reference
to the object (as del does) and finalising.

On 20/08/2010 8:23 PM, "Donny Viszneki" <donny.viszneki at gmail.com> wrote:

Armin: Sakesun used "del f" and it appears you did not. In Python
IIRC, an explicit call to del should kick off the finalizer to flush
and close the file!

open('x', 'w').write('hello') alone does not imply the file instance
(return value of open()) has been finalized because the garbage
collector may not have hit it yet.

Jython and IronPython are pretty much guaranteed to behave differently
under a wide variety of circumstances when it comes to the garbage
collector. Do not rely on the garbage collector for program semantics!

Because Sakesun has used "del f" it should be quite a concern that the
file has not been finalized properly!


On Fri, Aug 20, 2010 at 5:57 AM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Sakesun,
>
> On Thu, Aug ...
--
http://codebad.com/

_______________________________________________
pypy-dev at codespeak.net
http://codespeak.net/mailman/...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100820/43eabd16/attachment.html>

From arigo at tunes.org  Fri Aug 20 13:06:49 2010
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 20 Aug 2010 13:06:49 +0200
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
	<AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
Message-ID: <20100820110649.GA23268@code0.codespeak.net>

Hi Donny,

On Fri, Aug 20, 2010 at 06:23:26AM -0400, Donny Viszneki wrote:
> Armin: Sakesun used "del f" and it appears you did not.

As explained earlier this makes no difference.  E.g. in any Python
version, the following code would not call the __del__ method of the
object x either:

>>> x = SomeClassWithADel()
>>> y = x
>>> del x


A bientot,

Armin.


From p.giarrusso at gmail.com  Fri Aug 20 15:39:22 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Fri, 20 Aug 2010 15:39:22 +0200
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
	<AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
Message-ID: <AANLkTin6Uq2HRGwr8PBqiEAEeyevZp_Hd0bCHedOVZiA@mail.gmail.com>

On Fri, Aug 20, 2010 at 12:23, Donny Viszneki <donny.viszneki at gmail.com> wrote:
> Armin: Sakesun used "del f" and it appears you did not.
Actually, he didn't either. He said "I think that open(?xxx?,
?w?).write(?stuff?)" is equivalent to using del (which he thought
would work), and the equivalence was correct.

Anyway, in the _first reply_ message, he realized that using:

ipy -c "open(?xxx?, ?w?).write(?stuff?)"
jython -c "open(?xxx?, ?w?).write(?stuff?)"

made a difference (because the interpreter exited), so that problem
was solved. His mail implies that on PyPy he typed the code at the
prompt, rather than at -c.

> In Python
> IIRC, an explicit call to del should kick off the finalizer to flush
> and close the file!

No, as shown by Armin. del will just clear the reference, which on
CPython means decreasing the refcount. Refcounting will then finalize
the object immediately, a GC at some later point, if it runs at all -
there's no such guarantee on Java and .NET. For Java, that's unless
you do special unsafe setup (System.runFinalizersOnExit(), it's
discouraged for a number of reasons, see docs). On .NET, I expect a
such method to exist, too, since they were so unaware of problems
wiith finalizers in .NET 1.0 to give them the syntax of destructors.
But .NET 2.0 has SafeHandles, which guarantee release of critical
resources if the "finalization" code follows some restriction, using
_reference counting_:

http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.safehandle.aspx
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.safehandle.dangerousaddref.aspx

> open('x', 'w').write('hello') alone does not imply the file instance
> (return value of open()) has been finalized because the garbage
> collector may not have hit it yet.

On CPython, you have such an implication, because of refcounting semantics.

> On Fri, Aug 20, 2010 at 5:57 AM, Armin Rigo <arigo at tunes.org> wrote:
>> Hi Sakesun,
>>
>> On Thu, Aug 19, 2010 at 11:25:42AM +0700, sakesun roykiatisak wrote:
>>> >>> f = open('xxx', 'w')
>>> >>> f.write('stuff')
>>> >>> del f
>>>
>>> Also, I've tried that with both Jython and IronPython and they all work
>>> fine.
>>
>> I guess that you didn't try exactly the same thing. ?If I do:
>>
>> ? ?arigo at tannit ~ $ jython
>> ? ?Jython 2.2.1 on java1.6.0_20
>> ? ?Type "copyright", "credits" or "license" for more information.
>> ? ?>>> open('x', 'w').write('hello')
>> ? ?>>>
>>
>> Then "cat x" in another terminal shows an empty file. ?The file "x" is
>> only filled when I exit Jython. ?It is exactly the same behavior as I
>> get on PyPy. ?Maybe I missed something, and there is a different way to
>> do things such that it works on Jython but not on PyPy; if so, can you
>> describe it more precisely? ?Thanks!

-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From arigo at tunes.org  Fri Aug 20 16:06:31 2010
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 20 Aug 2010 16:06:31 +0200
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
	<AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
Message-ID: <20100820140631.GA3513@code0.codespeak.net>

Hi Donny,

On Fri, Aug 20, 2010 at 06:23:26AM -0400, Donny Viszneki wrote:
> Armin: Sakesun used "del f" and it appears you did not. In Python
> IIRC, an explicit call to del should kick off the finalizer to flush
> and close the file!

No, you are wrong.  Try for example:

    >>> f = open('xxx')
    >>> g = f
    >>> del f

After this, 'g' still refers to the file, and it is still open.

If you want the file to be flushed and closed, then call 'f.close()' :-)


A bientot,

Armin.


From p.giarrusso at gmail.com  Fri Aug 20 19:01:07 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Fri, 20 Aug 2010 19:01:07 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
Message-ID: <AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>

2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
> Hi, I'm just curious about the feasibility of running python code in a gpu
> by extending pypy.
Disclaimer: I am not a PyPy developer, even if I've been following the
project with interest. Nor am I an expert of GPU - I provide links to
the literature I've read.
Yet, I believe that such an attempt is unlikely to be interesting.
Quoting Wikipedia's synthesis:
"Unlike CPUs however, GPUs have a parallel throughput architecture
that emphasizes executing many concurrent threads slowly, rather than
executing a single thread very fast."
And significant optimizations are needed anyway to get performance for
GPU code (and if you don't need the last bit of performance, why
bother with a GPU?), so I think that the need to use a C-like language
is the smallest problem.

> I don't have the time (and probably the knowledge neither) to develop that
> pypy extension, but I just want to know if it's possible.
> I'm interested in languages like openCL and nvidia's CUDA because I think
> the future of supercomputing is going to be GPGPU.

I would like to point out that while for some cases it might be right,
the importance of GPGPU is probably often exaggerated:

http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#

Researchers in the field are mostly aware of the fact that GPGPU is
the way to go only for a very restricted category of code. For that
code, fine.
Thus, instead of running Python code in a GPU, designing from scratch
an easy way to program a GPU efficiently, for those task, is better,
and projects for that already exist (i.e. what you cite).

Additionally, it would take probably a different kind of JIT to
exploit GPUs. No branch prediction, very small non-coherent caches, no
efficient synchronization primitives, as I read from this paper... I'm
no expert, but I guess you'd need to rearchitecture from scratch the
needed optimizations.
And it took 20-30 years to get from the first, slow Lisp (1958) to,
say, Self (1991), a landmark in performant high-level languages,
derived from SmallTalk. Most of that would have to be redone.

So, I guess that the effort to compile Python code for a GPU is not
worth it. There might be further reasons due to the kind of code a JIT
generates, since a GPU has no branch predictor, no caches, and so on,
but I'm no GPU expert and I would have to check again.

Finally, for general purpose code, exploiting the big expected number
of CPUs on our desktop systems is already a challenge.

> There's people working in
> bringing GPGPU to python:
>
> http://mathema.tician.de/software/pyopencl
> http://mathema.tician.de/software/pycuda
>
> Would it be possible to run python code in parallel without the need (for
> the developer) of actively parallelizing the code?

I would say that Python is not yet the language to use to write
efficient parallel code, because of the Global Interpreter Lock
(Google for "Python GIL"). The two implementations having no GIL are
IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
and the current focus is not on removing it.
Scientific computing uses external libraries (like NumPy) - for the
supported algorithms, one could introduce parallelism at that level.
If that's enough for your application, good.
If you want to write a parallel algorithm in Python, we're not there yet.

> I'm not talking about code of hard concurrency, but of code with intrinsic
> parallelism (let's say matrix multiplication).

Automatic parallelization is hard, see:
http://en.wikipedia.org/wiki/Automatic_parallelization

Lots of scientists have tried, lots of money has been invested, but
it's still hard.
The only practical approaches still require the programmer to
introduce parallelism, but in ways much simpler than using
multithreading directly. Google OpenMP and Cilk.

> Would a JIT compilation be capable of detecting parallelism?
Summing up what is above, probably not.

Moreover, matrix multiplication may not be so easy as one might think.
I do not know how to write it for a GPU, but in the end I reference
some suggestions from that paper (where it is one of the benchmarks).
But here, I explain why writing it for a CPU is complicated. You can
multiply two matrixes with a triply nested for, but such an algorithm
has poor performance for big matrixes because of bad cache locality.
GPUs, according to the above mentioned paper, provide no caches and
hides latency in other ways.

See here for the two main alternative ideas which allow solving this
problem of writing an efficient matrix multiplication algorithm:
http://en.wikipedia.org/wiki/Cache_blocking
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm

Then, you need to parallelize the resulting code yourself, which might
or might not be easy (depending on the interactions between the
parallel blocks that are found there).
In that paper, where matrix multiplication is called as SGEMM (the
BLAS routine implementing it), they suggest using a cache-blocked
version of matrix multiplication for both CPUs and GPUs, and argue
that parallelization is then easy.

Cheers,
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From jbaker at zyasoft.com  Fri Aug 20 20:20:17 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Fri, 20 Aug 2010 12:20:17 -0600
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <20100820140631.GA3513@code0.codespeak.net>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
	<AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
	<20100820140631.GA3513@code0.codespeak.net>
Message-ID: <AANLkTim0JL5+jieM3L1xHbpmQedCVLhO8BmX2WwS1x7Y@mail.gmail.com>

Obviously please close the file, ideally using something like the
with-statement or at least finally. But for perhaps the convenience of
scripters, and the sorrow of everyone else ;), Jython will close the file
upon clean termination of the JVM via registering a closer of such files
with Runtime#addShutdownHook<http://download.oracle.com/javase/6/docs/api/java/lang/Runtime.html#addShutdownHook(java.lang.Thread)>

This is currently part of the most important outstanding
bug<http://bugs.jython.org/issue1327>in Jython 2.5.2, and something
that has to be resolved for 2.5.2 beta 2,
because of how it interacts with classloaders and prevents their class GC
upon reload (thus potentially exhausting permgen).

On Fri, Aug 20, 2010 at 8:06 AM, Armin Rigo <arigo at tunes.org> wrote:

> Hi Donny,
>
> On Fri, Aug 20, 2010 at 06:23:26AM -0400, Donny Viszneki wrote:
> > Armin: Sakesun used "del f" and it appears you did not. In Python
> > IIRC, an explicit call to del should kick off the finalizer to flush
> > and close the file!
>
> No, you are wrong.  Try for example:
>
>    >>> f = open('xxx')
>    >>> g = f
>    >>> del f
>
> After this, 'g' still refers to the file, and it is still open.
>
> If you want the file to be flushed and closed, then call 'f.close()' :-)
>
>
> A bientot,
>
> Armin.
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100820/26c1f8d8/attachment.html>

From jbaker at zyasoft.com  Fri Aug 20 20:25:11 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Fri, 20 Aug 2010 12:25:11 -0600
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
Message-ID: <AANLkTinj=qG+RVH=ZEpZkW=6wfsvv0mWPwoZ_UKqeZZY@mail.gmail.com>

Jython single-threaded performance has little to do with a lack of the GIL.
Probably the only direct manifestation is seen in the overhead of allocating
__dict__ (or dict) objects because Python attributes have volatile memory
semantics, which is ensured by the backing of a ConcurrentHashMap, which can
be expensive to allocate. There are workarounds.

2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>

> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
> > Hi, I'm just curious about the feasibility of running python code in a
> gpu
> > by extending pypy.
> Disclaimer: I am not a PyPy developer, even if I've been following the
> project with interest. Nor am I an expert of GPU - I provide links to
> the literature I've read.
> Yet, I believe that such an attempt is unlikely to be interesting.
> Quoting Wikipedia's synthesis:
> "Unlike CPUs however, GPUs have a parallel throughput architecture
> that emphasizes executing many concurrent threads slowly, rather than
> executing a single thread very fast."
> And significant optimizations are needed anyway to get performance for
> GPU code (and if you don't need the last bit of performance, why
> bother with a GPU?), so I think that the need to use a C-like language
> is the smallest problem.
>
> > I don't have the time (and probably the knowledge neither) to develop
> that
> > pypy extension, but I just want to know if it's possible.
> > I'm interested in languages like openCL and nvidia's CUDA because I think
> > the future of supercomputing is going to be GPGPU.
>
> I would like to point out that while for some cases it might be right,
> the importance of GPGPU is probably often exaggerated:
>
>
> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
>
> Researchers in the field are mostly aware of the fact that GPGPU is
> the way to go only for a very restricted category of code. For that
> code, fine.
> Thus, instead of running Python code in a GPU, designing from scratch
> an easy way to program a GPU efficiently, for those task, is better,
> and projects for that already exist (i.e. what you cite).
>
> Additionally, it would take probably a different kind of JIT to
> exploit GPUs. No branch prediction, very small non-coherent caches, no
> efficient synchronization primitives, as I read from this paper... I'm
> no expert, but I guess you'd need to rearchitecture from scratch the
> needed optimizations.
> And it took 20-30 years to get from the first, slow Lisp (1958) to,
> say, Self (1991), a landmark in performant high-level languages,
> derived from SmallTalk. Most of that would have to be redone.
>
> So, I guess that the effort to compile Python code for a GPU is not
> worth it. There might be further reasons due to the kind of code a JIT
> generates, since a GPU has no branch predictor, no caches, and so on,
> but I'm no GPU expert and I would have to check again.
>
> Finally, for general purpose code, exploiting the big expected number
> of CPUs on our desktop systems is already a challenge.
>
> > There's people working in
> > bringing GPGPU to python:
> >
> > http://mathema.tician.de/software/pyopencl
> > http://mathema.tician.de/software/pycuda
> >
> > Would it be possible to run python code in parallel without the need (for
> > the developer) of actively parallelizing the code?
>
> I would say that Python is not yet the language to use to write
> efficient parallel code, because of the Global Interpreter Lock
> (Google for "Python GIL"). The two implementations having no GIL are
> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
> and the current focus is not on removing it.
> Scientific computing uses external libraries (like NumPy) - for the
> supported algorithms, one could introduce parallelism at that level.
> If that's enough for your application, good.
> If you want to write a parallel algorithm in Python, we're not there yet.
>
> > I'm not talking about code of hard concurrency, but of code with
> intrinsic
> > parallelism (let's say matrix multiplication).
>
> Automatic parallelization is hard, see:
> http://en.wikipedia.org/wiki/Automatic_parallelization
>
> Lots of scientists have tried, lots of money has been invested, but
> it's still hard.
> The only practical approaches still require the programmer to
> introduce parallelism, but in ways much simpler than using
> multithreading directly. Google OpenMP and Cilk.
>
> > Would a JIT compilation be capable of detecting parallelism?
> Summing up what is above, probably not.
>
> Moreover, matrix multiplication may not be so easy as one might think.
> I do not know how to write it for a GPU, but in the end I reference
> some suggestions from that paper (where it is one of the benchmarks).
> But here, I explain why writing it for a CPU is complicated. You can
> multiply two matrixes with a triply nested for, but such an algorithm
> has poor performance for big matrixes because of bad cache locality.
> GPUs, according to the above mentioned paper, provide no caches and
> hides latency in other ways.
>
> See here for the two main alternative ideas which allow solving this
> problem of writing an efficient matrix multiplication algorithm:
> http://en.wikipedia.org/wiki/Cache_blocking
> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>
> Then, you need to parallelize the resulting code yourself, which might
> or might not be easy (depending on the interactions between the
> parallel blocks that are found there).
> In that paper, where matrix multiplication is called as SGEMM (the
> BLAS routine implementing it), they suggest using a cache-blocked
> version of matrix multiplication for both CPUs and GPUs, and argue
> that parallelization is then easy.
>
> Cheers,
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100820/734dfeaf/attachment.html>

From p.giarrusso at gmail.com  Fri Aug 20 22:45:21 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Fri, 20 Aug 2010 22:45:21 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTinj=qG+RVH=ZEpZkW=6wfsvv0mWPwoZ_UKqeZZY@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTinj=qG+RVH=ZEpZkW=6wfsvv0mWPwoZ_UKqeZZY@mail.gmail.com>
Message-ID: <AANLkTikmAgBh3s5CtrQhtrdNbeh3men7wkK19xk8koQQ@mail.gmail.com>

2010/8/20 Jim Baker <jbaker at zyasoft.com>:
> Jython single-threaded performance has little to do with a lack of the GIL.

Never implied that - I do believe that a GIL-less fast Python is
possible. I just meant we don't have one yet.

> Probably the only direct manifestation is seen in the overhead of allocating
> __dict__ (or dict) objects because Python attributes have volatile memory
> semantics
Uh? "Jython memory model" doesn't seem to find anything. Is there any
docs on this, with the rationale for the choice you describe?

I've only found the Unladen Swallow proposals for a memory model:
http://code.google.com/p/unladen-swallow/wiki/MemoryModel (and
python-safethread, which I don't like).

As a Java programmer using Jython, I wouldn't expect to have any
volatile field ever, but I would expect to be able to act on different
fields indipendently - the race conditions we have to protect from are
the ones on structual modification (unless the table uses open
addressing).
_This_ can be implemented through ConcurrentHashMap (which also makes
all fields volatile), but an implementation not guaranteeing volatile
semantics (if possible) would have been equally valid.
I am interested because I want to experiment with alternatives.

Of course, you can offer stronger semantics, but then you should also
advertise that fields are volatile, thus I don't need a lock to pass a
reference.

> , which is ensured by the backing of a ConcurrentHashMap, which can
> be expensive to allocate. There are workarounds.

I'm also curious about such workarounds - are they currently
implemented or speculations?

> 2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>
>>
>> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
>> > Hi, I'm just curious about the feasibility of running python code in a
>> > gpu
>> > by extending pypy.
>> Disclaimer: I am not a PyPy developer, even if I've been following the
>> project with interest. Nor am I an expert of GPU - I provide links to
>> the literature I've read.
>> Yet, I believe that such an attempt is unlikely to be interesting.
>> Quoting Wikipedia's synthesis:
>> "Unlike CPUs however, GPUs have a parallel throughput architecture
>> that emphasizes executing many concurrent threads slowly, rather than
>> executing a single thread very fast."
>> And significant optimizations are needed anyway to get performance for
>> GPU code (and if you don't need the last bit of performance, why
>> bother with a GPU?), so I think that the need to use a C-like language
>> is the smallest problem.
>>
>> > I don't have the time (and probably the knowledge neither) to develop
>> > that
>> > pypy extension, but I just want to know if it's possible.
>> > I'm interested in languages like openCL and nvidia's CUDA because I
>> > think
>> > the future of supercomputing is going to be GPGPU.
>>
>> I would like to point out that while for some cases it might be right,
>> the importance of GPGPU is probably often exaggerated:
>>
>>
>> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
>>
>> Researchers in the field are mostly aware of the fact that GPGPU is
>> the way to go only for a very restricted category of code. For that
>> code, fine.
>> Thus, instead of running Python code in a GPU, designing from scratch
>> an easy way to program a GPU efficiently, for those task, is better,
>> and projects for that already exist (i.e. what you cite).
>>
>> Additionally, it would take probably a different kind of JIT to
>> exploit GPUs. No branch prediction, very small non-coherent caches, no
>> efficient synchronization primitives, as I read from this paper... I'm
>> no expert, but I guess you'd need to rearchitecture from scratch the
>> needed optimizations.
>> And it took 20-30 years to get from the first, slow Lisp (1958) to,
>> say, Self (1991), a landmark in performant high-level languages,
>> derived from SmallTalk. Most of that would have to be redone.
>>
>> So, I guess that the effort to compile Python code for a GPU is not
>> worth it. There might be further reasons due to the kind of code a JIT
>> generates, since a GPU has no branch predictor, no caches, and so on,
>> but I'm no GPU expert and I would have to check again.
>>
>> Finally, for general purpose code, exploiting the big expected number
>> of CPUs on our desktop systems is already a challenge.
>>
>> > There's people working in
>> > bringing GPGPU to python:
>> >
>> > http://mathema.tician.de/software/pyopencl
>> > http://mathema.tician.de/software/pycuda
>> >
>> > Would it be possible to run python code in parallel without the need
>> > (for
>> > the developer) of actively parallelizing the code?
>>
>> I would say that Python is not yet the language to use to write
>> efficient parallel code, because of the Global Interpreter Lock
>> (Google for "Python GIL"). The two implementations having no GIL are
>> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
>> and the current focus is not on removing it.
>> Scientific computing uses external libraries (like NumPy) - for the
>> supported algorithms, one could introduce parallelism at that level.
>> If that's enough for your application, good.
>> If you want to write a parallel algorithm in Python, we're not there yet.
>>
>> > I'm not talking about code of hard concurrency, but of code with
>> > intrinsic
>> > parallelism (let's say matrix multiplication).
>>
>> Automatic parallelization is hard, see:
>> http://en.wikipedia.org/wiki/Automatic_parallelization
>>
>> Lots of scientists have tried, lots of money has been invested, but
>> it's still hard.
>> The only practical approaches still require the programmer to
>> introduce parallelism, but in ways much simpler than using
>> multithreading directly. Google OpenMP and Cilk.
>>
>> > Would a JIT compilation be capable of detecting parallelism?
>> Summing up what is above, probably not.
>>
>> Moreover, matrix multiplication may not be so easy as one might think.
>> I do not know how to write it for a GPU, but in the end I reference
>> some suggestions from that paper (where it is one of the benchmarks).
>> But here, I explain why writing it for a CPU is complicated. You can
>> multiply two matrixes with a triply nested for, but such an algorithm
>> has poor performance for big matrixes because of bad cache locality.
>> GPUs, according to the above mentioned paper, provide no caches and
>> hides latency in other ways.
>>
>> See here for the two main alternative ideas which allow solving this
>> problem of writing an efficient matrix multiplication algorithm:
>> http://en.wikipedia.org/wiki/Cache_blocking
>> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>>
>> Then, you need to parallelize the resulting code yourself, which might
>> or might not be easy (depending on the interactions between the
>> parallel blocks that are found there).
>> In that paper, where matrix multiplication is called as SGEMM (the
>> BLAS routine implementing it), they suggest using a cache-blocked
>> version of matrix multiplication for both CPUs and GPUs, and argue
>> that parallelization is then easy.
>>
>> Cheers,
>> --
>> Paolo Giarrusso - Ph.D. Student
>> http://www.informatik.uni-marburg.de/~pgiarrusso/
>> _______________________________________________
>> pypy-dev at codespeak.net
>> http://codespeak.net/mailman/listinfo/pypy-dev
>


-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From fijall at gmail.com  Fri Aug 20 22:51:42 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 20 Aug 2010 22:51:42 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
Message-ID: <AANLkTimQVQtfuzX5KPEAzNaDAwZ3ZLTRmU5jeUUVTzi4@mail.gmail.com>

2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>:
> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
>> Hi, I'm just curious about the feasibility of running python code in a gpu
>> by extending pypy.
> Disclaimer: I am not a PyPy developer, even if I've been following the
> project with interest. Nor am I an expert of GPU - I provide links to
> the literature I've read.
> Yet, I believe that such an attempt is unlikely to be interesting.
> Quoting Wikipedia's synthesis:
> "Unlike CPUs however, GPUs have a parallel throughput architecture
> that emphasizes executing many concurrent threads slowly, rather than
> executing a single thread very fast."
> And significant optimizations are needed anyway to get performance for
> GPU code (and if you don't need the last bit of performance, why
> bother with a GPU?), so I think that the need to use a C-like language
> is the smallest problem.
>
>> I don't have the time (and probably the knowledge neither) to develop that
>> pypy extension, but I just want to know if it's possible.
>> I'm interested in languages like openCL and nvidia's CUDA because I think
>> the future of supercomputing is going to be GPGPU.

Python is a very different language than CUDA or openCL, hence it's
not completely to map python's semantics to something that will make
sense for GPU.

>
> I would like to point out that while for some cases it might be right,
> the importance of GPGPU is probably often exaggerated:
>
> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
>
> Researchers in the field are mostly aware of the fact that GPGPU is
> the way to go only for a very restricted category of code. For that
> code, fine.
> Thus, instead of running Python code in a GPU, designing from scratch
> an easy way to program a GPU efficiently, for those task, is better,
> and projects for that already exist (i.e. what you cite).
>
> Additionally, it would take probably a different kind of JIT to
> exploit GPUs. No branch prediction, very small non-coherent caches, no
> efficient synchronization primitives, as I read from this paper... I'm
> no expert, but I guess you'd need to rearchitecture from scratch the
> needed optimizations.
> And it took 20-30 years to get from the first, slow Lisp (1958) to,
> say, Self (1991), a landmark in performant high-level languages,
> derived from SmallTalk. Most of that would have to be redone.
>
> So, I guess that the effort to compile Python code for a GPU is not
> worth it. There might be further reasons due to the kind of code a JIT
> generates, since a GPU has no branch predictor, no caches, and so on,
> but I'm no GPU expert and I would have to check again.
>
> Finally, for general purpose code, exploiting the big expected number
> of CPUs on our desktop systems is already a challenge.
>
>> There's people working in
>> bringing GPGPU to python:
>>
>> http://mathema.tician.de/software/pyopencl
>> http://mathema.tician.de/software/pycuda
>>
>> Would it be possible to run python code in parallel without the need (for
>> the developer) of actively parallelizing the code?
>
> I would say that Python is not yet the language to use to write
> efficient parallel code, because of the Global Interpreter Lock
> (Google for "Python GIL"). The two implementations having no GIL are
> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
> and the current focus is not on removing it.
> Scientific computing uses external libraries (like NumPy) - for the
> supported algorithms, one could introduce parallelism at that level.
> If that's enough for your application, good.
> If you want to write a parallel algorithm in Python, we're not there yet.
>
>> I'm not talking about code of hard concurrency, but of code with intrinsic
>> parallelism (let's say matrix multiplication).
>
> Automatic parallelization is hard, see:
> http://en.wikipedia.org/wiki/Automatic_parallelization
>
> Lots of scientists have tried, lots of money has been invested, but
> it's still hard.
> The only practical approaches still require the programmer to
> introduce parallelism, but in ways much simpler than using
> multithreading directly. Google OpenMP and Cilk.
>
>> Would a JIT compilation be capable of detecting parallelism?
> Summing up what is above, probably not.
>
> Moreover, matrix multiplication may not be so easy as one might think.
> I do not know how to write it for a GPU, but in the end I reference
> some suggestions from that paper (where it is one of the benchmarks).
> But here, I explain why writing it for a CPU is complicated. You can
> multiply two matrixes with a triply nested for, but such an algorithm
> has poor performance for big matrixes because of bad cache locality.
> GPUs, according to the above mentioned paper, provide no caches and
> hides latency in other ways.
>
> See here for the two main alternative ideas which allow solving this
> problem of writing an efficient matrix multiplication algorithm:
> http://en.wikipedia.org/wiki/Cache_blocking
> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>
> Then, you need to parallelize the resulting code yourself, which might
> or might not be easy (depending on the interactions between the
> parallel blocks that are found there).
> In that paper, where matrix multiplication is called as SGEMM (the
> BLAS routine implementing it), they suggest using a cache-blocked
> version of matrix multiplication for both CPUs and GPUs, and argue
> that parallelization is then easy.

What's interesting in using GPU and a JIT is optimizing numpy
vectorized operations to speed up things like big_array_a +
big_array_b using SSE and GPU. However, I don't think anyone plans to
work on it in a near future and if you don't have time this stays as a
topic of interest only :)

>
> Cheers,
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev


From jonah at eecs.berkeley.edu  Fri Aug 20 23:05:15 2010
From: jonah at eecs.berkeley.edu (Jeff Anderson-Lee)
Date: Fri, 20 Aug 2010 14:05:15 -0700
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTimQVQtfuzX5KPEAzNaDAwZ3ZLTRmU5jeUUVTzi4@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTimQVQtfuzX5KPEAzNaDAwZ3ZLTRmU5jeUUVTzi4@mail.gmail.com>
Message-ID: <4C6EEE0B.7060500@eecs.berkeley.edu>

  On 8/20/2010 1:51 PM, Maciej Fijalkowski wrote:
> 2010/8/20 Paolo Giarrusso<p.giarrusso at gmail.com>:
>> 2010/8/20 Jorge Tim?n<timon.elviejo at gmail.com>:
>>> Hi, I'm just curious about the feasibility of running python code in a gpu
>>> by extending pypy.
>> Disclaimer: I am not a PyPy developer, even if I've been following the
>> project with interest. Nor am I an expert of GPU - I provide links to
>> the literature I've read.
>> Yet, I believe that such an attempt is unlikely to be interesting.
>> Quoting Wikipedia's synthesis:
>> "Unlike CPUs however, GPUs have a parallel throughput architecture
>> that emphasizes executing many concurrent threads slowly, rather than
>> executing a single thread very fast."
>> And significant optimizations are needed anyway to get performance for
>> GPU code (and if you don't need the last bit of performance, why
>> bother with a GPU?), so I think that the need to use a C-like language
>> is the smallest problem.
>>
>>> I don't have the time (and probably the knowledge neither) to develop that
>>> pypy extension, but I just want to know if it's possible.
>>> I'm interested in languages like openCL and nvidia's CUDA because I think
>>> the future of supercomputing is going to be GPGPU.
> Python is a very different language than CUDA or openCL, hence it's
> not completely to map python's semantics to something that will make
> sense for GPU.
Try googling: copperhead cuda
Also look at:

http://code.google.com/p/copperhead/wiki/Installing


From fijall at gmail.com  Fri Aug 20 23:18:12 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 20 Aug 2010 23:18:12 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <4C6EEE0B.7060500@eecs.berkeley.edu>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTimQVQtfuzX5KPEAzNaDAwZ3ZLTRmU5jeUUVTzi4@mail.gmail.com>
	<4C6EEE0B.7060500@eecs.berkeley.edu>
Message-ID: <AANLkTinLWEhHnWe9sGdS=0T-2K0Dxpdcd6hAeuTa9Du9@mail.gmail.com>

On Fri, Aug 20, 2010 at 11:05 PM, Jeff Anderson-Lee
<jonah at eecs.berkeley.edu> wrote:
> ?On 8/20/2010 1:51 PM, Maciej Fijalkowski wrote:
>> 2010/8/20 Paolo Giarrusso<p.giarrusso at gmail.com>:
>>> 2010/8/20 Jorge Tim?n<timon.elviejo at gmail.com>:
>>>> Hi, I'm just curious about the feasibility of running python code in a gpu
>>>> by extending pypy.
>>> Disclaimer: I am not a PyPy developer, even if I've been following the
>>> project with interest. Nor am I an expert of GPU - I provide links to
>>> the literature I've read.
>>> Yet, I believe that such an attempt is unlikely to be interesting.
>>> Quoting Wikipedia's synthesis:
>>> "Unlike CPUs however, GPUs have a parallel throughput architecture
>>> that emphasizes executing many concurrent threads slowly, rather than
>>> executing a single thread very fast."
>>> And significant optimizations are needed anyway to get performance for
>>> GPU code (and if you don't need the last bit of performance, why
>>> bother with a GPU?), so I think that the need to use a C-like language
>>> is the smallest problem.
>>>
>>>> I don't have the time (and probably the knowledge neither) to develop that
>>>> pypy extension, but I just want to know if it's possible.
>>>> I'm interested in languages like openCL and nvidia's CUDA because I think
>>>> the future of supercomputing is going to be GPGPU.
>> Python is a very different language than CUDA or openCL, hence it's
>> not completely to map python's semantics to something that will make
>> sense for GPU.
> Try googling: copperhead cuda
> Also look at:
>
> http://code.google.com/p/copperhead/wiki/Installing
>

What's the point of posting here project which has not released any code?


From jbaker at zyasoft.com  Fri Aug 20 23:27:20 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Fri, 20 Aug 2010 15:27:20 -0600
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTikmAgBh3s5CtrQhtrdNbeh3men7wkK19xk8koQQ@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTinj=qG+RVH=ZEpZkW=6wfsvv0mWPwoZ_UKqeZZY@mail.gmail.com>
	<AANLkTikmAgBh3s5CtrQhtrdNbeh3men7wkK19xk8koQQ@mail.gmail.com>
Message-ID: <AANLkTik1=ejn5+vX8FCzUsdm3uYrue2W2kreeBj4RcPq@mail.gmail.com>

The Unladen Swallow doc, which was derived from a PEP that Jeff proposed,
seems to be a fair descriptive outline of Python memory models in general,
and Jython's in specific.

Obviously the underlying implementation in the JVM is happens-before
consistency; everything else derives from there. The CHM  provides
additional consistency constraints that should imply sequential consistency
for a (vast) subset of Python programs. However, I can readily construct a
program that violates sequential consistency: maybe it uses slots (stored in
a Java array), or the array module (which also just wraps Java arrays), or
by accesses local variables in a frame from another thread (same storage,
same problem). Likewise I can also create Python programs that access Java
classes (since this is Jython!), and they too will only see happens-before
consistency.

Naturally, the workarounds I mentioned for improving performance in object
allocation all rely on not using CHM and its (modestly) expensive semantics.
So this would mean using a Java class in some way, possibly a HashMap
(especially one that's been exposed through our type expose mechanism to
avoid reflection overhead), or directly using a Java class of some kind
(again exposing is best, much like are builtin types like PyInteger),
possibly with all fields marked as volatile.

Hope this helps! If you are interested in studying this problem in more
depth for Jython, or other implementations, and the implications of our
hybrid model, it would certainly be most welcome. Unfortunately, it's not
something that Jython development itself will be working on (standard time
constraints apply here).

- Jim

2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>

> 2010/8/20 Jim Baker <jbaker at zyasoft.com>:
> > Jython single-threaded performance has little to do with a lack of the
> GIL.
>
> Never implied that - I do believe that a GIL-less fast Python is
> possible. I just meant we don't have one yet.
>
> > Probably the only direct manifestation is seen in the overhead of
> allocating
> > __dict__ (or dict) objects because Python attributes have volatile memory
> > semantics
> Uh? "Jython memory model" doesn't seem to find anything. Is there any
> docs on this, with the rationale for the choice you describe?
>
> I've only found the Unladen Swallow proposals for a memory model:
> http://code.google.com/p/unladen-swallow/wiki/MemoryModel (and
> python-safethread, which I don't like).
>
> As a Java programmer using Jython, I wouldn't expect to have any
> volatile field ever, but I would expect to be able to act on different
> fields indipendently - the race conditions we have to protect from are
> the ones on structual modification (unless the table uses open
> addressing).
> _This_ can be implemented through ConcurrentHashMap (which also makes
> all fields volatile), but an implementation not guaranteeing volatile
> semantics (if possible) would have been equally valid.
> I am interested because I want to experiment with alternatives.
>
> Of course, you can offer stronger semantics, but then you should also
> advertise that fields are volatile, thus I don't need a lock to pass a
> reference.
>
> > , which is ensured by the backing of a ConcurrentHashMap, which can
> > be expensive to allocate. There are workarounds.
>
> I'm also curious about such workarounds - are they currently
> implemented or speculations?
>
> > 2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>
> >>
> >> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
> >> > Hi, I'm just curious about the feasibility of running python code in a
> >> > gpu
> >> > by extending pypy.
> >> Disclaimer: I am not a PyPy developer, even if I've been following the
> >> project with interest. Nor am I an expert of GPU - I provide links to
> >> the literature I've read.
> >> Yet, I believe that such an attempt is unlikely to be interesting.
> >> Quoting Wikipedia's synthesis:
> >> "Unlike CPUs however, GPUs have a parallel throughput architecture
> >> that emphasizes executing many concurrent threads slowly, rather than
> >> executing a single thread very fast."
> >> And significant optimizations are needed anyway to get performance for
> >> GPU code (and if you don't need the last bit of performance, why
> >> bother with a GPU?), so I think that the need to use a C-like language
> >> is the smallest problem.
> >>
> >> > I don't have the time (and probably the knowledge neither) to develop
> >> > that
> >> > pypy extension, but I just want to know if it's possible.
> >> > I'm interested in languages like openCL and nvidia's CUDA because I
> >> > think
> >> > the future of supercomputing is going to be GPGPU.
> >>
> >> I would like to point out that while for some cases it might be right,
> >> the importance of GPGPU is probably often exaggerated:
> >>
> >>
> >>
> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
> >>
> >> Researchers in the field are mostly aware of the fact that GPGPU is
> >> the way to go only for a very restricted category of code. For that
> >> code, fine.
> >> Thus, instead of running Python code in a GPU, designing from scratch
> >> an easy way to program a GPU efficiently, for those task, is better,
> >> and projects for that already exist (i.e. what you cite).
> >>
> >> Additionally, it would take probably a different kind of JIT to
> >> exploit GPUs. No branch prediction, very small non-coherent caches, no
> >> efficient synchronization primitives, as I read from this paper... I'm
> >> no expert, but I guess you'd need to rearchitecture from scratch the
> >> needed optimizations.
> >> And it took 20-30 years to get from the first, slow Lisp (1958) to,
> >> say, Self (1991), a landmark in performant high-level languages,
> >> derived from SmallTalk. Most of that would have to be redone.
> >>
> >> So, I guess that the effort to compile Python code for a GPU is not
> >> worth it. There might be further reasons due to the kind of code a JIT
> >> generates, since a GPU has no branch predictor, no caches, and so on,
> >> but I'm no GPU expert and I would have to check again.
> >>
> >> Finally, for general purpose code, exploiting the big expected number
> >> of CPUs on our desktop systems is already a challenge.
> >>
> >> > There's people working in
> >> > bringing GPGPU to python:
> >> >
> >> > http://mathema.tician.de/software/pyopencl
> >> > http://mathema.tician.de/software/pycuda
> >> >
> >> > Would it be possible to run python code in parallel without the need
> >> > (for
> >> > the developer) of actively parallelizing the code?
> >>
> >> I would say that Python is not yet the language to use to write
> >> efficient parallel code, because of the Global Interpreter Lock
> >> (Google for "Python GIL"). The two implementations having no GIL are
> >> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
> >> and the current focus is not on removing it.
> >> Scientific computing uses external libraries (like NumPy) - for the
> >> supported algorithms, one could introduce parallelism at that level.
> >> If that's enough for your application, good.
> >> If you want to write a parallel algorithm in Python, we're not there
> yet.
> >>
> >> > I'm not talking about code of hard concurrency, but of code with
> >> > intrinsic
> >> > parallelism (let's say matrix multiplication).
> >>
> >> Automatic parallelization is hard, see:
> >> http://en.wikipedia.org/wiki/Automatic_parallelization
> >>
> >> Lots of scientists have tried, lots of money has been invested, but
> >> it's still hard.
> >> The only practical approaches still require the programmer to
> >> introduce parallelism, but in ways much simpler than using
> >> multithreading directly. Google OpenMP and Cilk.
> >>
> >> > Would a JIT compilation be capable of detecting parallelism?
> >> Summing up what is above, probably not.
> >>
> >> Moreover, matrix multiplication may not be so easy as one might think.
> >> I do not know how to write it for a GPU, but in the end I reference
> >> some suggestions from that paper (where it is one of the benchmarks).
> >> But here, I explain why writing it for a CPU is complicated. You can
> >> multiply two matrixes with a triply nested for, but such an algorithm
> >> has poor performance for big matrixes because of bad cache locality.
> >> GPUs, according to the above mentioned paper, provide no caches and
> >> hides latency in other ways.
> >>
> >> See here for the two main alternative ideas which allow solving this
> >> problem of writing an efficient matrix multiplication algorithm:
> >> http://en.wikipedia.org/wiki/Cache_blocking
> >> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
> >>
> >> Then, you need to parallelize the resulting code yourself, which might
> >> or might not be easy (depending on the interactions between the
> >> parallel blocks that are found there).
> >> In that paper, where matrix multiplication is called as SGEMM (the
> >> BLAS routine implementing it), they suggest using a cache-blocked
> >> version of matrix multiplication for both CPUs and GPUs, and argue
> >> that parallelization is then easy.
> >>
> >> Cheers,
> >> --
> >> Paolo Giarrusso - Ph.D. Student
> >> http://www.informatik.uni-marburg.de/~pgiarrusso/
> >> _______________________________________________
> >> pypy-dev at codespeak.net
> >> http://codespeak.net/mailman/listinfo/pypy-dev
> >
>
>
>
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100820/4c59b014/attachment.html>

From jonah at eecs.berkeley.edu  Fri Aug 20 23:28:14 2010
From: jonah at eecs.berkeley.edu (Jeff Anderson-Lee)
Date: Fri, 20 Aug 2010 14:28:14 -0700
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTinLWEhHnWe9sGdS=0T-2K0Dxpdcd6hAeuTa9Du9@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTimQVQtfuzX5KPEAzNaDAwZ3ZLTRmU5jeUUVTzi4@mail.gmail.com>
	<4C6EEE0B.7060500@eecs.berkeley.edu>
	<AANLkTinLWEhHnWe9sGdS=0T-2K0Dxpdcd6hAeuTa9Du9@mail.gmail.com>
Message-ID: <4C6EF36E.5000507@eecs.berkeley.edu>

  On 8/20/2010 2:18 PM, Maciej Fijalkowski wrote:
> On Fri, Aug 20, 2010 at 11:05 PM, Jeff Anderson-Lee
> <jonah at eecs.berkeley.edu>  wrote:
>>   On 8/20/2010 1:51 PM, Maciej Fijalkowski wrote:
>>> 2010/8/20 Paolo Giarrusso<p.giarrusso at gmail.com>:
>>>> 2010/8/20 Jorge Tim?n<timon.elviejo at gmail.com>:
>>>>> Hi, I'm just curious about the feasibility of running python code in a gpu
>>>>> by extending pypy.
>>>> Disclaimer: I am not a PyPy developer, even if I've been following the
>>>> project with interest. Nor am I an expert of GPU - I provide links to
>>>> the literature I've read.
>>>> Yet, I believe that such an attempt is unlikely to be interesting.
>>>> Quoting Wikipedia's synthesis:
>>>> "Unlike CPUs however, GPUs have a parallel throughput architecture
>>>> that emphasizes executing many concurrent threads slowly, rather than
>>>> executing a single thread very fast."
>>>> And significant optimizations are needed anyway to get performance for
>>>> GPU code (and if you don't need the last bit of performance, why
>>>> bother with a GPU?), so I think that the need to use a C-like language
>>>> is the smallest problem.
>>>>
>>>>> I don't have the time (and probably the knowledge neither) to develop that
>>>>> pypy extension, but I just want to know if it's possible.
>>>>> I'm interested in languages like openCL and nvidia's CUDA because I think
>>>>> the future of supercomputing is going to be GPGPU.
>>> Python is a very different language than CUDA or openCL, hence it's
>>> not completely to map python's semantics to something that will make
>>> sense for GPU.
>> Try googling: copperhead cuda
>> Also look at:
>>
>> http://code.google.com/p/copperhead/wiki/Installing
>>
> What's the point of posting here project which has not released any code?
1) He is packaging it up for release this month:
> Comment by bryan.catanzaro 
> <http://code.google.com/u/bryan.catanzaro/>, Aug 05, 2010
>
> Before the end of August. I'm working on packaging it up right now. =)
>
2) Bryan's got a good head on his shoulders and has been working on this 
problem or some time. Rather than (or at least before) starting off in a 
completely new direction, its worth looking at something that has been 
in the works for a while now and is attaining some maturity.
3) You are welcome to ignore it, but some folks might be interested, and 
at least they now know it is there and where to look for more 
information and forthcoming code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100820/308786dd/attachment.html>

From ncbray at gmail.com  Sat Aug 21 00:46:53 2010
From: ncbray at gmail.com (Nick Bray)
Date: Fri, 20 Aug 2010 17:46:53 -0500
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
Message-ID: <AANLkTi=hnpVxTPtRDjD-_wfC7PWhCDCVRX_ddpo+1xEU@mail.gmail.com>

I can't speak for GPGPU, but I have compiled a subset of Python onto
the GPU for real-time rendering.  The subset is a little broader than
RPython in some ways (for example, attributes are semantically
identical to Python) and a little narrower in some ways (many forms of
recursion are disallowed.)  This big idea is that it allows you to
create a real-time rendering system with a single code base, and
transparently share functions and data structures between the CPU and
GPU.

http://www.ncbray.com/pystream.html
http://www.ncbray.com/ncbray-dissertation.pdf

It's at least ~100,000x faster than interpreting Python on the CPU.
"At least" because the measurements neglect doing things on the CPU
like texture sampling.  This speedup is pretty obscene, but if you
break it down it isn't too unbelievable... 100x for interpreted ->
compiled, 10x for abstraction overhead of using floats instead of
doubles, 100x for using the GPU and using it for a task it was built
for.

Parallelism issues are sidestepped by explicitly identifying the
parallel sections (one function processes every vertex, one function
processes every fragment), requiring the parallel sections have no
global side effects, and that certain I/O conventions are followed.
Sorry, no big answers here - it's essentially Pythonic stream
programming.

The biggest issues with getting Python onto the GPU is memory.  I was
actually targeting GLSL, not CUDA (it can't access the full rendering
pipeline), so pointers were not available.  To work around this, the
code is optimized to an extreme degree to remove as many memory
operations as possible.  The remaining memory operations are emulated
by splitting the heap into regions, indirecting through arrays, and
copying constant data wherever possible.  From what I've seen this is
where PyPy would have the most trouble: its analysis algorithms are
good enough for inferring types and  allowing compilation /
translation... they aren't designed to enable aggressive optimization
of memory operations (there's not a huge reason to do this if you're
translating RPython into C... the C compiler will do it for you).  In
general, GPU programming doesn't work well with memory access (too
many functional units, too little bandwidth).  Most of the "C-like"
GPU languages are designed to they can easily boil down into code
operating out of registers.  Python, on the other hand, is addicted to
heap memory.  Even if you target CUDA, eliminating memory operations
will be a huge win.

I'll freely admit there's some ugly things going on, such as the lack
of recursion, reliance on exhaustive inlining, requiring GPU code
follow a specific form, and not working well with container objects in
certain situations (it needs to bound the size of the heap).  In the
end, however, it's a talking dog... the grammar may not be perfect,
but the dog talks!  If anyone has questions, either private or on the
list, I'd be happy to answer them.  I have not done enough to
advertise my project, and this seems like a good place to start.

- Nick Bray

2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>:
> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
>> Hi, I'm just curious about the feasibility of running python code in a gpu
>> by extending pypy.
> Disclaimer: I am not a PyPy developer, even if I've been following the
> project with interest. Nor am I an expert of GPU - I provide links to
> the literature I've read.
> Yet, I believe that such an attempt is unlikely to be interesting.
> Quoting Wikipedia's synthesis:
> "Unlike CPUs however, GPUs have a parallel throughput architecture
> that emphasizes executing many concurrent threads slowly, rather than
> executing a single thread very fast."
> And significant optimizations are needed anyway to get performance for
> GPU code (and if you don't need the last bit of performance, why
> bother with a GPU?), so I think that the need to use a C-like language
> is the smallest problem.
>
>> I don't have the time (and probably the knowledge neither) to develop that
>> pypy extension, but I just want to know if it's possible.
>> I'm interested in languages like openCL and nvidia's CUDA because I think
>> the future of supercomputing is going to be GPGPU.
>
> I would like to point out that while for some cases it might be right,
> the importance of GPGPU is probably often exaggerated:
>
> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
>
> Researchers in the field are mostly aware of the fact that GPGPU is
> the way to go only for a very restricted category of code. For that
> code, fine.
> Thus, instead of running Python code in a GPU, designing from scratch
> an easy way to program a GPU efficiently, for those task, is better,
> and projects for that already exist (i.e. what you cite).
>
> Additionally, it would take probably a different kind of JIT to
> exploit GPUs. No branch prediction, very small non-coherent caches, no
> efficient synchronization primitives, as I read from this paper... I'm
> no expert, but I guess you'd need to rearchitecture from scratch the
> needed optimizations.
> And it took 20-30 years to get from the first, slow Lisp (1958) to,
> say, Self (1991), a landmark in performant high-level languages,
> derived from SmallTalk. Most of that would have to be redone.
>
> So, I guess that the effort to compile Python code for a GPU is not
> worth it. There might be further reasons due to the kind of code a JIT
> generates, since a GPU has no branch predictor, no caches, and so on,
> but I'm no GPU expert and I would have to check again.
>
> Finally, for general purpose code, exploiting the big expected number
> of CPUs on our desktop systems is already a challenge.
>
>> There's people working in
>> bringing GPGPU to python:
>>
>> http://mathema.tician.de/software/pyopencl
>> http://mathema.tician.de/software/pycuda
>>
>> Would it be possible to run python code in parallel without the need (for
>> the developer) of actively parallelizing the code?
>
> I would say that Python is not yet the language to use to write
> efficient parallel code, because of the Global Interpreter Lock
> (Google for "Python GIL"). The two implementations having no GIL are
> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
> and the current focus is not on removing it.
> Scientific computing uses external libraries (like NumPy) - for the
> supported algorithms, one could introduce parallelism at that level.
> If that's enough for your application, good.
> If you want to write a parallel algorithm in Python, we're not there yet.
>
>> I'm not talking about code of hard concurrency, but of code with intrinsic
>> parallelism (let's say matrix multiplication).
>
> Automatic parallelization is hard, see:
> http://en.wikipedia.org/wiki/Automatic_parallelization
>
> Lots of scientists have tried, lots of money has been invested, but
> it's still hard.
> The only practical approaches still require the programmer to
> introduce parallelism, but in ways much simpler than using
> multithreading directly. Google OpenMP and Cilk.
>
>> Would a JIT compilation be capable of detecting parallelism?
> Summing up what is above, probably not.
>
> Moreover, matrix multiplication may not be so easy as one might think.
> I do not know how to write it for a GPU, but in the end I reference
> some suggestions from that paper (where it is one of the benchmarks).
> But here, I explain why writing it for a CPU is complicated. You can
> multiply two matrixes with a triply nested for, but such an algorithm
> has poor performance for big matrixes because of bad cache locality.
> GPUs, according to the above mentioned paper, provide no caches and
> hides latency in other ways.
>
> See here for the two main alternative ideas which allow solving this
> problem of writing an efficient matrix multiplication algorithm:
> http://en.wikipedia.org/wiki/Cache_blocking
> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>
> Then, you need to parallelize the resulting code yourself, which might
> or might not be easy (depending on the interactions between the
> parallel blocks that are found there).
> In that paper, where matrix multiplication is called as SGEMM (the
> BLAS routine implementing it), they suggest using a cache-blocked
> version of matrix multiplication for both CPUs and GPUs, and argue
> that parallelization is then easy.
>
> Cheers,
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev


From p.giarrusso at gmail.com  Sat Aug 21 01:46:28 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Sat, 21 Aug 2010 01:46:28 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTik1=ejn5+vX8FCzUsdm3uYrue2W2kreeBj4RcPq@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTinj=qG+RVH=ZEpZkW=6wfsvv0mWPwoZ_UKqeZZY@mail.gmail.com>
	<AANLkTikmAgBh3s5CtrQhtrdNbeh3men7wkK19xk8koQQ@mail.gmail.com>
	<AANLkTik1=ejn5+vX8FCzUsdm3uYrue2W2kreeBj4RcPq@mail.gmail.com>
Message-ID: <AANLkTimtG=7EVsnkbCs8am_6+aT=wM06Jw-Aa95s2kFU@mail.gmail.com>

2010/8/20 Jim Baker <jbaker at zyasoft.com>:
> The Unladen Swallow doc, which was derived from a PEP that Jeff proposed,
> seems to be a fair descriptive outline of Python memory models in general,
> and Jython's in specific.
> Obviously the underlying implementation in the JVM is happens-before
> consistency; everything else derives from there. The CHM ?provides
> additional consistency constraints that should imply sequential consistency
> for a (vast) subset of Python programs. However, I can readily construct a
> program that violates sequential consistency: maybe it uses slots (stored in
> a Java array), or the array module (which also just wraps Java arrays), or
> by accesses local variables in a frame from another thread (same storage,
> same problem). Likewise I can also create Python programs that access Java
> classes (since this is Jython!), and they too will only see happens-before
> consistency.
OK, I guess that volatile semantics for fields were just a side effect.
As far as I can see, you get sequential consistency only in practice,
not in theory - you have happens-before edges only when a reader and a
writer touch the same field. In practice, the few cases where it
matters can't apply here as far as I know, because a hash function
decides to which of the submaps a mapping belongs.

Your mention of slots is very cool! You made me recall that once you
get shadow classes in Python, you can not only do inline caching, but
you also have the _same_ object layout as in slots, because adding a
member causes a hidden class transition, getting rid of any kind of
dictionary _after compilation_. Two exceptions:
* an immutable dictionary mapping field names to offsets is used both
during JIT compilation and when inline caching fails, for
* a fallback case for when __dict__ is used, I guess, is needed. Not
necessarily a dictionary must be used though: one could also make
__dict__ usage just cause class transitions.
* beyond a certain member count, i.e., if __dict__ is used as a
general-purpose dictionary, one might want to switch back to a
dictionary representation. This only applies if this is done in
Pythonic code (guess not) - I remember this case from V8, for
JavaScript, where the expected usage is different.

> Naturally, the workarounds I mentioned for improving performance in object
> allocation all rely on not using CHM and its (modestly) expensive semantics.
> So this would mean using a Java class in some way, possibly a HashMap
> (especially one that's been exposed through our type expose mechanism to
> avoid reflection overhead), or directly using a Java class of some kind
> (again exposing is best, much like are builtin types like PyInteger),
> possibly with all fields marked as volatile.
> Hope this helps! If you are interested in studying this problem in more
> depth for Jython, or other implementations, and the implications of our
> hybrid model, it would certainly be most welcome. Unfortunately, it's not
> something that Jython development itself will be working on (standard time
> constraints apply here).

Such constraints apply to me too - but I hope this to work on that.

> - Jim
> 2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>
>>
>> 2010/8/20 Jim Baker <jbaker at zyasoft.com>:
>> > Jython single-threaded performance has little to do with a lack of the
>> > GIL.
>>
>> Never implied that - I do believe that a GIL-less fast Python is
>> possible. I just meant we don't have one yet.
>>
>> > Probably the only direct manifestation is seen in the overhead of
>> > allocating
>> > __dict__ (or dict) objects because Python attributes have volatile
>> > memory
>> > semantics
>> Uh? "Jython memory model" doesn't seem to find anything. Is there any
>> docs on this, with the rationale for the choice you describe?
>>
>> I've only found the Unladen Swallow proposals for a memory model:
>> http://code.google.com/p/unladen-swallow/wiki/MemoryModel (and
>> python-safethread, which I don't like).
>>
>> As a Java programmer using Jython, I wouldn't expect to have any
>> volatile field ever, but I would expect to be able to act on different
>> fields indipendently - the race conditions we have to protect from are
>> the ones on structual modification (unless the table uses open
>> addressing).
>> _This_ can be implemented through ConcurrentHashMap (which also makes
>> all fields volatile), but an implementation not guaranteeing volatile
>> semantics (if possible) would have been equally valid.
>> I am interested because I want to experiment with alternatives.
>>
>> Of course, you can offer stronger semantics, but then you should also
>> advertise that fields are volatile, thus I don't need a lock to pass a
>> reference.
>>
>> > , which is ensured by the backing of a ConcurrentHashMap, which can
>> > be expensive to allocate. There are workarounds.
>>
>> I'm also curious about such workarounds - are they currently
>> implemented or speculations?
>>
>> > 2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>
>> >>
>> >> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
>> >> > Hi, I'm just curious about the feasibility of running python code in
>> >> > a
>> >> > gpu
>> >> > by extending pypy.
>> >> Disclaimer: I am not a PyPy developer, even if I've been following the
>> >> project with interest. Nor am I an expert of GPU - I provide links to
>> >> the literature I've read.
>> >> Yet, I believe that such an attempt is unlikely to be interesting.
>> >> Quoting Wikipedia's synthesis:
>> >> "Unlike CPUs however, GPUs have a parallel throughput architecture
>> >> that emphasizes executing many concurrent threads slowly, rather than
>> >> executing a single thread very fast."
>> >> And significant optimizations are needed anyway to get performance for
>> >> GPU code (and if you don't need the last bit of performance, why
>> >> bother with a GPU?), so I think that the need to use a C-like language
>> >> is the smallest problem.
>> >>
>> >> > I don't have the time (and probably the knowledge neither) to develop
>> >> > that
>> >> > pypy extension, but I just want to know if it's possible.
>> >> > I'm interested in languages like openCL and nvidia's CUDA because I
>> >> > think
>> >> > the future of supercomputing is going to be GPGPU.
>> >>
>> >> I would like to point out that while for some cases it might be right,
>> >> the importance of GPGPU is probably often exaggerated:
>> >>
>> >>
>> >>
>> >> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
>> >>
>> >> Researchers in the field are mostly aware of the fact that GPGPU is
>> >> the way to go only for a very restricted category of code. For that
>> >> code, fine.
>> >> Thus, instead of running Python code in a GPU, designing from scratch
>> >> an easy way to program a GPU efficiently, for those task, is better,
>> >> and projects for that already exist (i.e. what you cite).
>> >>
>> >> Additionally, it would take probably a different kind of JIT to
>> >> exploit GPUs. No branch prediction, very small non-coherent caches, no
>> >> efficient synchronization primitives, as I read from this paper... I'm
>> >> no expert, but I guess you'd need to rearchitecture from scratch the
>> >> needed optimizations.
>> >> And it took 20-30 years to get from the first, slow Lisp (1958) to,
>> >> say, Self (1991), a landmark in performant high-level languages,
>> >> derived from SmallTalk. Most of that would have to be redone.
>> >>
>> >> So, I guess that the effort to compile Python code for a GPU is not
>> >> worth it. There might be further reasons due to the kind of code a JIT
>> >> generates, since a GPU has no branch predictor, no caches, and so on,
>> >> but I'm no GPU expert and I would have to check again.
>> >>
>> >> Finally, for general purpose code, exploiting the big expected number
>> >> of CPUs on our desktop systems is already a challenge.
>> >>
>> >> > There's people working in
>> >> > bringing GPGPU to python:
>> >> >
>> >> > http://mathema.tician.de/software/pyopencl
>> >> > http://mathema.tician.de/software/pycuda
>> >> >
>> >> > Would it be possible to run python code in parallel without the need
>> >> > (for
>> >> > the developer) of actively parallelizing the code?
>> >>
>> >> I would say that Python is not yet the language to use to write
>> >> efficient parallel code, because of the Global Interpreter Lock
>> >> (Google for "Python GIL"). The two implementations having no GIL are
>> >> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
>> >> and the current focus is not on removing it.
>> >> Scientific computing uses external libraries (like NumPy) - for the
>> >> supported algorithms, one could introduce parallelism at that level.
>> >> If that's enough for your application, good.
>> >> If you want to write a parallel algorithm in Python, we're not there
>> >> yet.
>> >>
>> >> > I'm not talking about code of hard concurrency, but of code with
>> >> > intrinsic
>> >> > parallelism (let's say matrix multiplication).
>> >>
>> >> Automatic parallelization is hard, see:
>> >> http://en.wikipedia.org/wiki/Automatic_parallelization
>> >>
>> >> Lots of scientists have tried, lots of money has been invested, but
>> >> it's still hard.
>> >> The only practical approaches still require the programmer to
>> >> introduce parallelism, but in ways much simpler than using
>> >> multithreading directly. Google OpenMP and Cilk.
>> >>
>> >> > Would a JIT compilation be capable of detecting parallelism?
>> >> Summing up what is above, probably not.
>> >>
>> >> Moreover, matrix multiplication may not be so easy as one might think.
>> >> I do not know how to write it for a GPU, but in the end I reference
>> >> some suggestions from that paper (where it is one of the benchmarks).
>> >> But here, I explain why writing it for a CPU is complicated. You can
>> >> multiply two matrixes with a triply nested for, but such an algorithm
>> >> has poor performance for big matrixes because of bad cache locality.
>> >> GPUs, according to the above mentioned paper, provide no caches and
>> >> hides latency in other ways.
>> >>
>> >> See here for the two main alternative ideas which allow solving this
>> >> problem of writing an efficient matrix multiplication algorithm:
>> >> http://en.wikipedia.org/wiki/Cache_blocking
>> >> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>> >>
>> >> Then, you need to parallelize the resulting code yourself, which might
>> >> or might not be easy (depending on the interactions between the
>> >> parallel blocks that are found there).
>> >> In that paper, where matrix multiplication is called as SGEMM (the
>> >> BLAS routine implementing it), they suggest using a cache-blocked
>> >> version of matrix multiplication for both CPUs and GPUs, and argue
>> >> that parallelization is then easy.
>> >>
>> >> Cheers,
>> >> --
>> >> Paolo Giarrusso - Ph.D. Student
>> >> http://www.informatik.uni-marburg.de/~pgiarrusso/
>> >> _______________________________________________
>> >> pypy-dev at codespeak.net
>> >> http://codespeak.net/mailman/listinfo/pypy-dev
>> >
>>
>>
>>
>> --
>> Paolo Giarrusso - Ph.D. Student
>> http://www.informatik.uni-marburg.de/~pgiarrusso/
>
>


-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


From sakesun at gmail.com  Sat Aug 21 05:20:11 2010
From: sakesun at gmail.com (sakesun roykiatisak)
Date: Sat, 21 Aug 2010 10:20:11 +0700
Subject: [pypy-dev] What's wrong with >>> open(?xxx?,
	?w?).write(?stuff?) ?
In-Reply-To: <AANLkTin6Uq2HRGwr8PBqiEAEeyevZp_Hd0bCHedOVZiA@mail.gmail.com>
References: <AANLkTinsrrQ=3AVNX8LOFwN-4y94zyhUBZZ3BEpRjd3p@mail.gmail.com>
	<20100820095721.GC16244@code0.codespeak.net>
	<AANLkTi=S=8QVa+oe5QO6z_0GOYZdqrNOexuiw08W8yd8@mail.gmail.com>
	<AANLkTin6Uq2HRGwr8PBqiEAEeyevZp_Hd0bCHedOVZiA@mail.gmail.com>
Message-ID: <AANLkTinNv5v==cpG2++O3OGNkynZQu1JU98pR=7U9Rvs@mail.gmail.com>

This discussion is getting a little too long than necessary, at least for
me.  :)
Most of pypy talk video is in pretty poor recording quality. Most of the
time I
try to discern barely from the slides.

I always understand the difference between resource lifetime and object
lifetime.
Actually, in my most recent years, my sole python interpreter is
the non-refcounting IronPython already.  And I always wrap file operation
inside try/finally or with statement.

The problem is the example that claim to cause problem:

>>> open('xxx', 'w').write('stuff')

I misinterpret that the problem is caused in the "write" methods.
The above statement cause no problem, but the subsequent usage of
the file will. That's what I missed.
In fact, it might be more intuitive to demonstrate in a little longer
sample.

>>> open('xxx', 'w').write('stuff')
>>> assert open('xxx').read() == 'stuff'    # Might fail ! The first file
might not be closed yet !


Cheers.


On Fri, Aug 20, 2010 at 8:39 PM, Paolo Giarrusso <p.giarrusso at gmail.com>wrote:

> On Fri, Aug 20, 2010 at 12:23, Donny Viszneki <donny.viszneki at gmail.com>
> wrote:
> > Armin: Sakesun used "del f" and it appears you did not.
> Actually, he didn't either. He said "I think that open(?xxx?,
> ?w?).write(?stuff?)" is equivalent to using del (which he thought
> would work), and the equivalence was correct.
>
> Anyway, in the _first reply_ message, he realized that using:
>
> ipy -c "open(?xxx?, ?w?).write(?stuff?)"
> jython -c "open(?xxx?, ?w?).write(?stuff?)"
>
> made a difference (because the interpreter exited), so that problem
> was solved. His mail implies that on PyPy he typed the code at the
> prompt, rather than at -c.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100821/f0db4f70/attachment.html>

From hakan at debian.org  Sat Aug 21 09:06:10 2010
From: hakan at debian.org (Hakan Ardo)
Date: Sat, 21 Aug 2010 09:06:10 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTi=hnpVxTPtRDjD-_wfC7PWhCDCVRX_ddpo+1xEU@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>
	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>
	<AANLkTi=hnpVxTPtRDjD-_wfC7PWhCDCVRX_ddpo+1xEU@mail.gmail.com>
Message-ID: <AANLkTik-fs3hf_Scn+4gyeoe+ZM6G93MEYfrYA8MoMqp@mail.gmail.com>

Hi,
here is a another effort allowing you to write GPU kernels using
python, targeted at gpgpu. The programmer has to explicitly state the
parallelism and there are restrictions on what kind of constructs are
allowed in the kernels, but it's pretty cool:

  http://www.cs.lth.se/home/Calle_Lejdfors/pygpu/

On Sat, Aug 21, 2010 at 12:46 AM, Nick Bray <ncbray at gmail.com> wrote:
> I can't speak for GPGPU, but I have compiled a subset of Python onto
> the GPU for real-time rendering. ?The subset is a little broader than
> RPython in some ways (for example, attributes are semantically
> identical to Python) and a little narrower in some ways (many forms of
> recursion are disallowed.) ?This big idea is that it allows you to
> create a real-time rendering system with a single code base, and
> transparently share functions and data structures between the CPU and
> GPU.
>
> http://www.ncbray.com/pystream.html
> http://www.ncbray.com/ncbray-dissertation.pdf
>
> It's at least ~100,000x faster than interpreting Python on the CPU.
> "At least" because the measurements neglect doing things on the CPU
> like texture sampling. ?This speedup is pretty obscene, but if you
> break it down it isn't too unbelievable... 100x for interpreted ->
> compiled, 10x for abstraction overhead of using floats instead of
> doubles, 100x for using the GPU and using it for a task it was built
> for.
>
> Parallelism issues are sidestepped by explicitly identifying the
> parallel sections (one function processes every vertex, one function
> processes every fragment), requiring the parallel sections have no
> global side effects, and that certain I/O conventions are followed.
> Sorry, no big answers here - it's essentially Pythonic stream
> programming.
>
> The biggest issues with getting Python onto the GPU is memory. ?I was
> actually targeting GLSL, not CUDA (it can't access the full rendering
> pipeline), so pointers were not available. ?To work around this, the
> code is optimized to an extreme degree to remove as many memory
> operations as possible. ?The remaining memory operations are emulated
> by splitting the heap into regions, indirecting through arrays, and
> copying constant data wherever possible. ?From what I've seen this is
> where PyPy would have the most trouble: its analysis algorithms are
> good enough for inferring types and ?allowing compilation /
> translation... they aren't designed to enable aggressive optimization
> of memory operations (there's not a huge reason to do this if you're
> translating RPython into C... the C compiler will do it for you). ?In
> general, GPU programming doesn't work well with memory access (too
> many functional units, too little bandwidth). ?Most of the "C-like"
> GPU languages are designed to they can easily boil down into code
> operating out of registers. ?Python, on the other hand, is addicted to
> heap memory. ?Even if you target CUDA, eliminating memory operations
> will be a huge win.
>
> I'll freely admit there's some ugly things going on, such as the lack
> of recursion, reliance on exhaustive inlining, requiring GPU code
> follow a specific form, and not working well with container objects in
> certain situations (it needs to bound the size of the heap). ?In the
> end, however, it's a talking dog... the grammar may not be perfect,
> but the dog talks! ?If anyone has questions, either private or on the
> list, I'd be happy to answer them. ?I have not done enough to
> advertise my project, and this seems like a good place to start.
>
> - Nick Bray
>
> 2010/8/20 Paolo Giarrusso <p.giarrusso at gmail.com>:
>> 2010/8/20 Jorge Tim?n <timon.elviejo at gmail.com>:
>>> Hi, I'm just curious about the feasibility of running python code in a gpu
>>> by extending pypy.
>> Disclaimer: I am not a PyPy developer, even if I've been following the
>> project with interest. Nor am I an expert of GPU - I provide links to
>> the literature I've read.
>> Yet, I believe that such an attempt is unlikely to be interesting.
>> Quoting Wikipedia's synthesis:
>> "Unlike CPUs however, GPUs have a parallel throughput architecture
>> that emphasizes executing many concurrent threads slowly, rather than
>> executing a single thread very fast."
>> And significant optimizations are needed anyway to get performance for
>> GPU code (and if you don't need the last bit of performance, why
>> bother with a GPU?), so I think that the need to use a C-like language
>> is the smallest problem.
>>
>>> I don't have the time (and probably the knowledge neither) to develop that
>>> pypy extension, but I just want to know if it's possible.
>>> I'm interested in languages like openCL and nvidia's CUDA because I think
>>> the future of supercomputing is going to be GPGPU.
>>
>> I would like to point out that while for some cases it might be right,
>> the importance of GPGPU is probably often exaggerated:
>>
>> http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1#
>>
>> Researchers in the field are mostly aware of the fact that GPGPU is
>> the way to go only for a very restricted category of code. For that
>> code, fine.
>> Thus, instead of running Python code in a GPU, designing from scratch
>> an easy way to program a GPU efficiently, for those task, is better,
>> and projects for that already exist (i.e. what you cite).
>>
>> Additionally, it would take probably a different kind of JIT to
>> exploit GPUs. No branch prediction, very small non-coherent caches, no
>> efficient synchronization primitives, as I read from this paper... I'm
>> no expert, but I guess you'd need to rearchitecture from scratch the
>> needed optimizations.
>> And it took 20-30 years to get from the first, slow Lisp (1958) to,
>> say, Self (1991), a landmark in performant high-level languages,
>> derived from SmallTalk. Most of that would have to be redone.
>>
>> So, I guess that the effort to compile Python code for a GPU is not
>> worth it. There might be further reasons due to the kind of code a JIT
>> generates, since a GPU has no branch predictor, no caches, and so on,
>> but I'm no GPU expert and I would have to check again.
>>
>> Finally, for general purpose code, exploiting the big expected number
>> of CPUs on our desktop systems is already a challenge.
>>
>>> There's people working in
>>> bringing GPGPU to python:
>>>
>>> http://mathema.tician.de/software/pyopencl
>>> http://mathema.tician.de/software/pycuda
>>>
>>> Would it be possible to run python code in parallel without the need (for
>>> the developer) of actively parallelizing the code?
>>
>> I would say that Python is not yet the language to use to write
>> efficient parallel code, because of the Global Interpreter Lock
>> (Google for "Python GIL"). The two implementations having no GIL are
>> IronPython (as slow as CPython) and Jython (slower). PyPy has a GIL,
>> and the current focus is not on removing it.
>> Scientific computing uses external libraries (like NumPy) - for the
>> supported algorithms, one could introduce parallelism at that level.
>> If that's enough for your application, good.
>> If you want to write a parallel algorithm in Python, we're not there yet.
>>
>>> I'm not talking about code of hard concurrency, but of code with intrinsic
>>> parallelism (let's say matrix multiplication).
>>
>> Automatic parallelization is hard, see:
>> http://en.wikipedia.org/wiki/Automatic_parallelization
>>
>> Lots of scientists have tried, lots of money has been invested, but
>> it's still hard.
>> The only practical approaches still require the programmer to
>> introduce parallelism, but in ways much simpler than using
>> multithreading directly. Google OpenMP and Cilk.
>>
>>> Would a JIT compilation be capable of detecting parallelism?
>> Summing up what is above, probably not.
>>
>> Moreover, matrix multiplication may not be so easy as one might think.
>> I do not know how to write it for a GPU, but in the end I reference
>> some suggestions from that paper (where it is one of the benchmarks).
>> But here, I explain why writing it for a CPU is complicated. You can
>> multiply two matrixes with a triply nested for, but such an algorithm
>> has poor performance for big matrixes because of bad cache locality.
>> GPUs, according to the above mentioned paper, provide no caches and
>> hides latency in other ways.
>>
>> See here for the two main alternative ideas which allow solving this
>> problem of writing an efficient matrix multiplication algorithm:
>> http://en.wikipedia.org/wiki/Cache_blocking
>> http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>>
>> Then, you need to parallelize the resulting code yourself, which might
>> or might not be easy (depending on the interactions between the
>> parallel blocks that are found there).
>> In that paper, where matrix multiplication is called as SGEMM (the
>> BLAS routine implementing it), they suggest using a cache-blocked
>> version of matrix multiplication for both CPUs and GPUs, and argue
>> that parallelization is then easy.
>>
>> Cheers,
>> --
>> Paolo Giarrusso - Ph.D. Student
>> http://www.informatik.uni-marburg.de/~pgiarrusso/
>> _______________________________________________
>> pypy-dev at codespeak.net
>> http://codespeak.net/mailman/listinfo/pypy-dev
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>


-- 
H?kan Ard?


From cfbolz at gmx.de  Sat Aug 21 10:25:39 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Sat, 21 Aug 2010 10:25:39 +0200
Subject: [pypy-dev] gpgpu and pypy
In-Reply-To: <AANLkTimtG=7EVsnkbCs8am_6+aT=wM06Jw-Aa95s2kFU@mail.gmail.com>
References: <AANLkTik9E=EKnqkWz_Re9hmG5hcUY2WTKFk=rH=kGJc4@mail.gmail.com>	<AANLkTin=hnD7TEhtk7=wRdJi-pDhxRc1DHSoZHn60xsp@mail.gmail.com>	<AANLkTinj=qG+RVH=ZEpZkW=6wfsvv0mWPwoZ_UKqeZZY@mail.gmail.com>	<AANLkTikmAgBh3s5CtrQhtrdNbeh3men7wkK19xk8koQQ@mail.gmail.com>	<AANLkTik1=ejn5+vX8FCzUsdm3uYrue2W2kreeBj4RcPq@mail.gmail.com>
	<AANLkTimtG=7EVsnkbCs8am_6+aT=wM06Jw-Aa95s2kFU@mail.gmail.com>
Message-ID: <4C6F8D83.8060709@gmx.de>

Hi Paolo,

On 08/21/2010 01:46 AM, Paolo Giarrusso wrote:
[...]
> Your mention of slots is very cool! You made me recall that once you
> get shadow classes in Python, you can not only do inline caching, but
> you also have the _same_ object layout as in slots, because adding a
> member causes a hidden class transition, getting rid of any kind of
> dictionary _after compilation_. Two exceptions:
> * an immutable dictionary mapping field names to offsets is used both
> during JIT compilation and when inline caching fails, for
> * a fallback case for when __dict__ is used, I guess, is needed. Not
> necessarily a dictionary must be used though: one could also make
> __dict__ usage just cause class transitions.
> * beyond a certain member count, i.e., if __dict__ is used as a
> general-purpose dictionary, one might want to switch back to a
> dictionary representation. This only applies if this is done in
> Pythonic code (guess not) - I remember this case from V8, for
> JavaScript, where the expected usage is different.
>

Just as a note: PyPy's Python interpreter does all this already, and I 
am working on making it even cooler :-).

[...]

Cheers,

Carl Friedrich


From hakan at debian.org  Sat Aug 28 15:05:11 2010
From: hakan at debian.org (Hakan Ardo)
Date: Sat, 28 Aug 2010 15:05:11 +0200
Subject: [pypy-dev] Loop invaraints
Message-ID: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>

Hi,
some time ago, there were some discussion about loop invaraints, but
no conclusion. What do you think about the following approach:

- Let optimize_loop mark the arguments in loop.inputargs as invariant
if they appear at the same position in the jump instruction at the end
before calling propagate_formward

- Let the optimize_... methods emit operations that only uses
invariant arguments to some preamble instead of emitting them to
self.newoperations whenever that is safe. Also, the result of these
operations should probably be marked as invariant.

- Insert the created preamble at every point where the loop is called,
right before the jump.

- When compiling a bridge from a failing guard, run the the preamble
through propagate_formward and discard the emitted operations, to
inherit that part of the state of Optimizer.

This should place the invariant instructions at the end of the entry
bridge, which is a suitable place, right? At the end of a bridge from
a failing guard that maintains the invariants the optimizer should
remove the inserted preamble again, right? And at the end of a bridge
that invalidates them, enough of the preamble will be kept to maintain
correct behavior, right?

-- 
H?kan Ard?


From william.leslie.ttg at gmail.com  Sun Aug 29 00:05:06 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Sun, 29 Aug 2010 08:05:06 +1000
Subject: [pypy-dev] Loop invaraints
In-Reply-To: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
Message-ID: <AANLkTimOFwcGt2PS471B_57OnYuPv5nU5im-766qN_Xv@mail.gmail.com>

The other part of the work is the algorithm that finds loop variants. It is
similar to the algorithm for variable colour inference, so you do have a
starting point.

On 28/08/2010 11:12 PM, "Hakan Ardo" <hakan at debian.org> wrote:

Hi,
some time ago, there were some discussion about loop invaraints, but
no conclusion. What do you think about the following approach:

- Let optimize_loop mark the arguments in loop.inputargs as invariant
if they appear at the same position in the jump instruction at the end
before calling propagate_formward

- Let the optimize_... methods emit operations that only uses
invariant arguments to some preamble instead of emitting them to
self.newoperations whenever that is safe. Also, the result of these
operations should probably be marked as invariant.

- Insert the created preamble at every point where the loop is called,
right before the jump.

- When compiling a bridge from a failing guard, run the the preamble
through propagate_formward and discard the emitted operations, to
inherit that part of the state of Optimizer.

This should place the invariant instructions at the end of the entry
bridge, which is a suitable place, right? At the end of a bridge from
a failing guard that maintains the invariants the optimizer should
remove the inserted preamble again, right? And at the end of a bridge
that invalidates them, enough of the preamble will be kept to maintain
correct behavior, right?

--
H?kan Ard?
_______________________________________________
pypy-dev at codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20100829/24ebcd01/attachment.html>

From cfbolz at gmx.de  Sun Aug 29 12:32:23 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Sun, 29 Aug 2010 12:32:23 +0200
Subject: [pypy-dev] Loop invaraints
In-Reply-To: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
Message-ID: <4C7A3737.50902@gmx.de>

Hi H?kan,

thanks for taking up the topic.

On 08/28/2010 03:05 PM, Hakan Ardo wrote:
> - Let optimize_loop mark the arguments in loop.inputargs as
> invariant if they appear at the same position in the jump instruction
> at the end before calling propagate_formward

sounds good.

> - Let the optimize_... methods emit operations that only uses
> invariant arguments to some preamble instead of emitting them to
> self.newoperations whenever that is safe. Also, the result of these
> operations should probably be marked as invariant.

Need to be a bit careful about operations with side-effects, but
basically yes.

> - Insert the created preamble at every point where the loop is
> called, right before the jump.

This part makes sense to me. The code would have to be careful to match
the variables in the trace and in the preamble.

> - When compiling a bridge from a failing guard, run the the preamble
> through propagate_formward and discard the emitted operations, to
> inherit that part of the state of Optimizer.

... but I don't see why this is needed. Wouldn't you rather need the 
whole trace of the loop including the preamble up to the failing guard? 
This would be bad, because you need to store the full trace then.

> This should place the invariant instructions at the end of the entry
> bridge, which is a suitable place, right? At the end of a bridge
> from a failing guard that maintains the invariants the optimizer
> should remove the inserted preamble again, right? And at the end of a
> bridge that invalidates them, enough of the preamble will be kept to
> maintain correct behavior, right?

Yes to all the questions, at least as fas as I can see. I guess in 
practice there might be complications.

Cheers,

Carl Friedrich


P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think 
it would be good if the bounds-related optimizations could move out of 
optimizeopt.py to their own file, because otherwise optimizeopt.py is 
getting really unwieldy. Does that make sense?


From arigo at tunes.org  Sun Aug 29 13:04:11 2010
From: arigo at tunes.org (Armin Rigo)
Date: Sun, 29 Aug 2010 13:04:11 +0200
Subject: [pypy-dev] Loop invaraints
In-Reply-To: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
Message-ID: <20100829110411.GA13704@code0.codespeak.net>

Hi,

On Sat, Aug 28, 2010 at 03:05:11PM +0200, Hakan Ardo wrote:
> some time ago, there were some discussion about loop invaraints, but
> no conclusion.

A general answer to that question: there are two kinds of goals we can
have when optimizing.  One is to get the fastest possible code for small
Python loops, e.g. doing numerical computations.  The other is to get
reasonably good code for large and complicated loops, e.g. the dispatch
loop of some network application.  Although loop-invariant code motion
would definitely be great for the first kind of loops, it's unclear that
it helps on the second kind of loops.

As a similar consideration, I am thinking about trying to remove the
optimization that passes "virtuals" from one iteration of the loop to
the next one.  Although it has good effects on small loops, it has
actually a negative effect on large loops, because the loop taking
virtual arguments cannot be directly jumped to from the interpreter.

I'm not saying that loop-invariant code motion could also have a
negative effect on large loops; I think it's a pure win, so it's
probably worth a try.  I'm just giving a warning: it may not help much
in the case of a "general Python program doing lots of stuff", but only
in the case of small numerical computation loops.


A bientot,

Armin.


From hakan at debian.org  Sun Aug 29 13:49:23 2010
From: hakan at debian.org (Hakan Ardo)
Date: Sun, 29 Aug 2010 13:49:23 +0200
Subject: [pypy-dev] Loop invaraints
In-Reply-To: <4C7A3737.50902@gmx.de>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
	<4C7A3737.50902@gmx.de>
Message-ID: <AANLkTinwz=71akbSFGpDpOg5Vfq3SnE=dokfgs_i5f+4@mail.gmail.com>

On Sun, Aug 29, 2010 at 12:32 PM, Carl Friedrich Bolz <cfbolz at gmx.de> wrote:
>
> ... but I don't see why this is needed. Wouldn't you rather need the

My thinking was that for the preamble to be removed from the end of a
bridge maintaining the invariant this would be needed? But I might be
mistaking?

> whole trace of the loop including the preamble up to the failing guard?
> This would be bad, because you need to store the full trace then.

OK, so that might be a problem. Maybe it would be possible to extract
what part of the state it would be safe to inherit even if only the
preamble has been processed, i.e. self.pure_operations might be ok?

> P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think
> it would be good if the bounds-related optimizations could move out of
> optimizeopt.py to their own file, because otherwise optimizeopt.py is
> getting really unwieldy. Does that make sense?

Well, class IntBound and the propagate_bounds_ methods could probably
be moved elsewhere, but a lot of the work is done in optimize_...
methods, which I'm not so sure it would make sens to split up.

-- 
H?kan Ard?


From cfbolz at gmx.de  Sun Aug 29 14:03:37 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Sun, 29 Aug 2010 14:03:37 +0200
Subject: [pypy-dev] jit-bounds branch (was: Loop invaraints)
In-Reply-To: <AANLkTinwz=71akbSFGpDpOg5Vfq3SnE=dokfgs_i5f+4@mail.gmail.com>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>	<4C7A3737.50902@gmx.de>
	<AANLkTinwz=71akbSFGpDpOg5Vfq3SnE=dokfgs_i5f+4@mail.gmail.com>
Message-ID: <4C7A4C99.2050803@gmx.de>

On 08/29/2010 01:49 PM, Hakan Ardo wrote:
>> P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think
>> it would be good if the bounds-related optimizations could move out of
>> optimizeopt.py to their own file, because otherwise optimizeopt.py is
>> getting really unwieldy. Does that make sense?
>
> Well, class IntBound and the propagate_bounds_ methods could probably
> be moved elsewhere, but a lot of the work is done in optimize_...
> methods, which I'm not so sure it would make sens to split up.

I guess then the things that can be sanely moved should move. The file 
is nearly 2000 lines, which is way too big. I guess also the heap 
optimizations could go to their own file.

Carl Friedrich


From fijall at gmail.com  Sun Aug 29 22:05:49 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Sun, 29 Aug 2010 22:05:49 +0200
Subject: [pypy-dev] jit-bounds branch (was: Loop invaraints)
In-Reply-To: <4C7A4C99.2050803@gmx.de>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
	<4C7A3737.50902@gmx.de>
	<AANLkTinwz=71akbSFGpDpOg5Vfq3SnE=dokfgs_i5f+4@mail.gmail.com>
	<4C7A4C99.2050803@gmx.de>
Message-ID: <AANLkTiknjSwX7vY69hjcmZ1OLZq03QYAvxx41DQZA_+f@mail.gmail.com>

On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz at gmx.de> wrote:
> On 08/29/2010 01:49 PM, Hakan Ardo wrote:
>>> P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think
>>> it would be good if the bounds-related optimizations could move out of
>>> optimizeopt.py to their own file, because otherwise optimizeopt.py is
>>> getting really unwieldy. Does that make sense?
>>
>> Well, class IntBound and the propagate_bounds_ methods could probably
>> be moved elsewhere, but a lot of the work is done in optimize_...
>> methods, which I'm not so sure it would make sens to split up.
>
> I guess then the things that can be sanely moved should move. The file
> is nearly 2000 lines, which is way too big. I guess also the heap
> optimizations could go to their own file.
>
> Carl Friedrich

How about a couple of files (preferably small) each containing a
contained optimization if possible? (maybe a package?)


From hakan at debian.org  Tue Aug 31 09:25:13 2010
From: hakan at debian.org (Hakan Ardo)
Date: Tue, 31 Aug 2010 09:25:13 +0200
Subject: [pypy-dev] jit-bounds branch (was: Loop invaraints)
In-Reply-To: <AANLkTiknjSwX7vY69hjcmZ1OLZq03QYAvxx41DQZA_+f@mail.gmail.com>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
	<4C7A3737.50902@gmx.de>
	<AANLkTinwz=71akbSFGpDpOg5Vfq3SnE=dokfgs_i5f+4@mail.gmail.com>
	<4C7A4C99.2050803@gmx.de>
	<AANLkTiknjSwX7vY69hjcmZ1OLZq03QYAvxx41DQZA_+f@mail.gmail.com>
Message-ID: <AANLkTi=EOAmdiV0u5fdYFnL=xcBrLsCHtHAY=KqT1ees@mail.gmail.com>

Ok, so we split it up into a set of Optimization classes in separate
files. Each containing a subset of the optimize_... methods. Then we
have the propagate_forward method iterate over the instructions
passing them to one Optimization after the other? That way we keep the
single iteration over the instructions. Would it be preferable to
separate them even more and have each Optimization contain it's own
loop over the instructions?

On Sun, Aug 29, 2010 at 10:05 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz at gmx.de> wrote:
>> On 08/29/2010 01:49 PM, Hakan Ardo wrote:
>>>> P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think
>>>> it would be good if the bounds-related optimizations could move out of
>>>> optimizeopt.py to their own file, because otherwise optimizeopt.py is
>>>> getting really unwieldy. Does that make sense?
>>>
>>> Well, class IntBound and the propagate_bounds_ methods could probably
>>> be moved elsewhere, but a lot of the work is done in optimize_...
>>> methods, which I'm not so sure it would make sens to split up.
>>
>> I guess then the things that can be sanely moved should move. The file
>> is nearly 2000 lines, which is way too big. I guess also the heap
>> optimizations could go to their own file.
>>
>> Carl Friedrich
>
> How about a couple of files (preferably small) each containing a
> contained optimization if possible? (maybe a package?)
>


-- 
H?kan Ard?


From hakan at debian.org  Tue Aug 31 09:20:15 2010
From: hakan at debian.org (Hakan Ardo)
Date: Tue, 31 Aug 2010 09:20:15 +0200
Subject: [pypy-dev] Loop invaraints
In-Reply-To: <20100829110411.GA13704@code0.codespeak.net>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
	<20100829110411.GA13704@code0.codespeak.net>
Message-ID: <AANLkTik12x+r7ojwY2-WnLw9N0k9X1Dab2ZAc_fRn3jt@mail.gmail.com>

On Sun, Aug 29, 2010 at 1:04 PM, Armin Rigo <arigo at tunes.org> wrote:
>
> I'm not saying that loop-invariant code motion could also have a
> negative effect on large loops; I think it's a pure win, so it's
> probably worth a try. ?I'm just giving a warning: it may not help much
> in the case of a "general Python program doing lots of stuff", but only
> in the case of small numerical computation loops.

Right. I write a lot of numerical computation loops these days, both
small and somewhat bigger, and I am typically force to write them in C
to get decent performance. So the motivation here would rater be to
broaden the usability of python than to improve performance of
exciting python programs.

Another motivation might be to help pypy developers focus on the
important instruction while staring at traces, ie by hiding the
instructions that will be inserted only once :)


-- 
H?kan Ard?


From fijall at gmail.com  Tue Aug 31 10:38:22 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 31 Aug 2010 10:38:22 +0200
Subject: [pypy-dev] Loop invaraints
In-Reply-To: <AANLkTik12x+r7ojwY2-WnLw9N0k9X1Dab2ZAc_fRn3jt@mail.gmail.com>
References: <AANLkTi=kXbXMXJh63B07ZJFajA8uQGv2y6NjsTkbUO88@mail.gmail.com>
	<20100829110411.GA13704@code0.codespeak.net>
	<AANLkTik12x+r7ojwY2-WnLw9N0k9X1Dab2ZAc_fRn3jt@mail.gmail.com>
Message-ID: <AANLkTimQbyAFXA+UhM=u8ikAr3Cc_iVDgkfgD-Z1L0Qn@mail.gmail.com>

On Tue, Aug 31, 2010 at 9:20 AM, Hakan Ardo <hakan at debian.org> wrote:
> On Sun, Aug 29, 2010 at 1:04 PM, Armin Rigo <arigo at tunes.org> wrote:
>>
>> I'm not saying that loop-invariant code motion could also have a
>> negative effect on large loops; I think it's a pure win, so it's
>> probably worth a try. ?I'm just giving a warning: it may not help much
>> in the case of a "general Python program doing lots of stuff", but only
>> in the case of small numerical computation loops.
>
> Right. I write a lot of numerical computation loops these days, both
> small and somewhat bigger, and I am typically force to write them in C
> to get decent performance. So the motivation here would rater be to
> broaden the usability of python than to improve performance of
> exciting python programs.
>
> Another motivation might be to help pypy developers focus on the
> important instruction while staring at traces, ie by hiding the
> instructions that will be inserted only once :)
>

I second hakan here - small loops are not uninteresting, since it
broadens areas where you can use python, not limiting yourself to
existing python programs.