[pypy-dev] FW: Would the following shared memory model be possible?

Sun Aug 1 04:09:28 CEST 2010

> > I have no idea what I did you warrant you hateful replies towards me, but
> > they really are not appropriate (in public or private email).
>
> I had absolutely no intention of offending you, and am deeply sorry
> for any offense that I may have caused you.

I must admit, I'm rather surprised by your reply -- and also thank you.  I'm sorry for the trouble I caused you with this.  I had hoped for a good conversation about the issues related to Kamaelia, yet everytime I got a reply back, it seemed like you were mad at me for some unknown reason.

As a simple example of what I mean.  In your first email, you mentioned a lot of different programming styles related to FBP and Kamaelia.  Since I am interested in parallel "research" I put those words into google and made a whole bookmark section so that I would have them for future study.  When I replied back, I figured that this would be a good way to lighten the mood in the email, so I thanked you for the info and asked for any more links/ideas you might want to mention.  A shared point of interest might be a good way to foster a nice friendly atmosphere.  Unfortunately, I am assuming you must have misunderstood me, because instead of stirring up a friendly interest, I received several paragraphs about me being inconsiderate (not searching google for something) and putting an undue burden on you.

At this point, it would be really unfair to talk about it further.  I guess to sum things up, I got the impression that you were mad at me for some unknown reason: it was like each successive email was going further and further down hill -- and I didn't know why.
However, in the end, I am glad that the whole situation could be resolved the way it has been.

> I'll bow out at this point.
I wouldn't want you to have to do that; your input can be very useful to people.
I apologized, you apolgized....  Some stuff was cleared up, etc....  I don't think anybody here is holding a grude or going rehash the topic again (me and you included).

You have very specific knowledge related to Kamaelia that could be useful to people exploring micro-threading implementations, parallel computing, etc....

---------------
Now, to change the topic slightly (and hopeful in a positive way).

I'm not sure if it really matters to you, but I have been considering another possible way to make a parallel tasklet (like for FBP and Kamaelia) in PyPy... but I don't have 3+ months to spend ironing out the flaws, learning PyPy, writing an implemenation, etc....   ... and to be honest, I would not feel comfortable asking someone else (here or otherwise) to try and make something for my benefit.

On another note... something that might actually interest you:  I have done some work on a graphical front-end for FBP ... nothing super special, mind you, but I could keep you informed in the future if is something of interest to you.

Anyways, hope this email turns out on a positive note for you and everyone else.
Kevin

> > I have no idea what I did you warrant you hateful replies towards me, but
> > they really are not appropriate (in public or private email).
>
> I had absolutely no intention of offending you, and am deeply sorry
> for any offense that I may have caused you.
>
> In my reply I merely wanted to flag that I don't have time to go into
> everything (like most people), that asking questions in a public realm
> is better because you may then get answers from multiple people, and
> that people who appear to do some research first tend to get better
> answers. I also tried to give an example, but that doesn't appear to
> have been helpful. (I'm fallible like everyone else)
>
> My intention there was to be helpful and to explain why I have that
> view of only replying on list, and it appears to have offended you
> instead, and I apologise. (one person's direct and helpful speech in
> one place can be a mortal insult somewhere else)
>
> After those couple of paragraphs, I tried to add to your discussion by
> replying to your specific points which you asked about parallel
> execution, noting places and examples where it is possible today. (to
> varying degrees of satisfaction) I then also tried to answer your
> point of "if something extra could be done, what would probably be
> generally useful". To that I noted that *my* talk there was cheap, and
> that execution was hard.
>
> Somehow along the way, my intent to try to be helpful to you has
> resulted in offending and upsetting you, and for that I am truly sorry
> - life is simply too short for people to upset each other, and in no
> way was my post intended as "hateful", and once again, my apologies.
> In future please assume good intentions - I assumed good intentions on
> your part.
>
> I'll bow out at this point.
>
> Best Regards,
>
>
> Michael.
>
> >
> >> Date: Sat, 31 Jul 2010 02:08:49 +0100
> >> Subject: Re: [pypy-dev] FW: Would the following shared memory model be
> >> possible?
> >> From: sparks.m at gmail.com
> >> To: kevinar18 at hotmail.com
> >> CC: pypy-dev at codespeak.net
> >>
> >> On Thu, Jul 29, 2010 at 6:44 PM, Kevin Ar18 wrote:
> >> > You brought up a lot of topics. I went ahead and sent you a private
> >> > email.
> >> > There's always lots of interesting things I can add to my list of things
> >> > to
> >> > learn about. :)
> >>
> >> Yes, there are lots of interesting things. I have a limited amount of
> >> time however (I should be in bed, it's very late here, but I do /try/
> >> to reply to on-list mails), so cannot spood feed you. Mailing me
> >> directly rather than a (relevant) list precludes you getting answers
> >> from someone other than me. Not being on lists also precludes you
> >> getting answers to questions by chance. Changing emails and names in
> >> email headers also makes keeping track of people hard...
> >>
> >> (For example you asked off list last year about Kamaelia's license
> >> from a different email address. Since it wasn't searchable I
> >> completely forgot. You also asked all sorts of questions but didn't
> >> want the answers public, so I didn't reply. If instead you'd
> >> subscribed to the list, and asked there, you'd've found out that
> >> Kamaelia's license changed - to the Apache Software License v2 ...)
> >>
> >> If I mention something you find interesting, please Google first and
> >> then ask publicly somewhere relevant. (the answer and question are
> >> then googleable, and you're doing the community a service IMO if you
> >> ask q's that way - if you're question is somewhere relevant and shows
> >> you've already googled prior work as far as you can... People are
> >> time however (I should be in bed, it's very late here, but I do /try/
> >> to reply to on-list mails), so cannot spood feed you. Mailing me
> >> directly rather than a (relevant) list precludes you getting answers
> >> from someone other than me. Not being on lists also precludes you
> >> getting answers to questions by chance. Changing emails and names in
> >> email headers also makes keeping track of people hard...
> >>
> >> (For example you asked off list last year about Kamaelia's license
> >> from a different email address. Since it wasn't searchable I
> >> completely forgot. You also asked all sorts of questions but didn't
> >> want the answers public, so I didn't reply. If instead you'd
> >> subscribed to the list, and asked there, you'd've found out that
> >> Kamaelia's license changed - to the Apache Software License v2 ...)
> >>
> >> always willing to help people who show willing to help themselves in
> >> my experience.)
> >>
> >> >> just looks to me that you're tieing yourself up in knots over things
> >> >> that aren't problems, when there are some things which could be useful
> >> >> (in practice) & interesting in this space.
> >> > The particular issue in this situation is that there is no way to make
> >> > Kamaelia, FBP, or other concurrency concepts run in parallel (unless you
> >> > are
> >> > willing to accept lots of overhead like with the multiprocessing
> >> > queues).
> >> >
> >> > Since you have worked with Kamaelia code a lot... you understand a lot
> >> > more
> >> > about implementation details. Do you think the previous shared memory
> >> > concept or something like it would let you make Kamaelia parallel?
> >> > If not, can you think of any method that would let you make Kamaelia
> >> > parallel?
> >>
> >> Kamaelia already CAN run components in parallel in different processes
> >> (has been able to do so for quite some time) or on different
> >> processors. Indeed, all you do is use a ProcessPipeline or
> >> ProcessGraphline rather than Pipeline or Graphline, and the components
> >> in the top level are spread across processes. I still view the code as
> >> experimental, but it does work, and when needed is very useful.
> >>
> >> Kamaelia running on Iron Python can run on seperate processors sharing
> >> data efficiently (due to lack of GIL there) happily too. Threaded
> >> components there do that naturally - I don't use IronPython, but it
> >> does run on Iron Python. On windows this is easiest, though Mono works
> >> just as well.
> >>
> >> I believe Jython also is GIL free, and Kamaelia's Axon runs there
> >> cleanly too. As a result because Kamaelia is pure python, it runs
> >> truly in parallel there too (based on hearing from people using
> >> kamaelia on jython). Cpython is the exception (and a rather big one at
> >> that). (Pypy has a choice IIUC)
> >>
> >> Personally, I think if PyPy worked with generators better (which is
> >> why I keep an eye on PyPy) and cpyext was improved, it'd provide a
> >> really compelling platform for me. (I was rather gutted at Europython
> >> to hear that PyPy's generator support was still ... problematic)
> >>
> >> Regarding the *efficiency* and *enforcement* of the approach taken, I
> >> feel you're chasing the wrong tree, but let's go there.
> >>
> >> What approach does baseline (non-Iron Python running) kamaelia take
> >> for multi-process work?
> >>
> >> For historical reasons, it builds on top of pprocess rather than
> >> multiprocessing module based. This means for interprocess
> >> communications objects are pickled before being sent over operating
> >> system pipes.
> >>
> >> This provides an obvious communications overhead - and this isn't
> >> really kamaelia specific at this point.
> >>
> >> However, shifting data from one CPU to another is expensive, and only
> >> worth doing in some circumstances. (Consider a machine with several
> >> physical CPUs - each has a local CPU cache, and the data needs to be
> >> transferred from one to another, which is why partly people worry
> >> about thread/CPU affinity etc)
> >>
> >> Basically, if you can manage it, you don't want to shift data between
> >> CPUs, you want to partition the processing.
> >>
> >> ie you may want to start caring about the size of messages and number
> >> of messages going between processes. Sending small and few between
> >> processes is going to be preferable to sending large and many for
> >> throughput purposes.
> >>
> >> In the case of small and few, the approach of pickling and sending
> >> across OS pipes isn't such a bad idea. It works.
> >>
> >> If you do want to share data between CPUs, and it sounds like you do,
> >> then most OSs already provide a means of doing that - threads. The
> >> conventions people use for using threads are where they become
> >> unpicked, but as a mechanism, threads do generally work, and work
> >> well.
> >>
> >> As well as channels/boxes, you can use an STM approach, such as than
> >> in Axon.STM ...
> >> * http://www.kamaelia.org/STM.html
> >> *
> >> http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Bindings/STM/
> >>
> >> ...which is logically very similar to version control for variables. A
> >> downside of STM (at least with this approach) however, is that for it
> >> to work, you need either copy on write semantics for objects, or full
> >> copying of objects or similar. Personally I use a biological metaphor
> >> here, in that channels/boxes and components, and similar perform a
> >> similar function to axons and neurons in the body, and that STM is
> >> akin to the hormonal system for maintaining and controlling system
> >> state. (I modelled biological tree growth many moons ago)
> >>
> >> Anyhow, coming back to threads, that brings us back to python, and
> >> implementations with a GIL, and those without.
> >>
> >> For implementations with a GIL, you then have a choice: do I choose to
> >> try and implement a memory model that _enforces_ data locality? that
> >> is if a piece of data is in use inside a single "process" or "thread"
> >> (from hereon I'll use "task" as a generic phrase) that trying to use
> >> it inside another causes a problem for the task attempting to breach
> >> the model.
> >>
> >> In order to enforce this, I personally believe you'd need to use
> >> multiple processes, and only share data through dedicated code
> >> managing shared memory. You could of course do this outside user code.
> >> To do this you'd need an abstraction that made sense, and something
> >> like stackless' channels or kamaelia's (in/out) box model makes sense
> >> there. (The CELL API uses a mailbox metaphor as well for reference)
> >>
> >> In that case, you have a choice. You either copy the data into shared
> >> memory, or you share the data in situ. The former gives you back
> >> precisely the same overhead previously described, or the latter
> >> fragments your memory (since you can no longer access it). You could
> >> also have compaction.
> >>
> >> However, personally, I think any possible benefits here are outweighed
> >> by the costs and complexity.
> >>
> >> The alternative is to _encourage_ data locality. That is encourage the
> >> usage and sharing of data such that whilst you could share data
> >> between tasks and cause corruption that the common way of using the
> >> system discourages such actions. In essence that's what I try to do in
> >> Kamaelia, and it seems to work. Specifically, the model says:
> >>
> >> * If I take a piece of data from an inbox, I own it and can do anything
> >> with it that I like. If you think of a physical piece of paper and
> >> I take it from an intray, then that really is the case.
> >>
> >> * If I put a piece of data in an outbox, I no longer own it and should
> >> not attempt to do anything more with it. Again, using a physical
> >> metaphor, and naming scheme helps here. In particular, if I put a
> >> piece of paper in the post, I can no longer modify it. How it gets
> >> to its recipient is not my concern either.
> >>
> >> In practice this does actually work. If you add in immutable tuples,
> >> and immutable strings then it becomes a lot clearer how this can work.
> >>
> >> Is there a risk here of accidental modification? Yes. However, the
> >> size and general simplicity of components tends to lead to such
> >> problems being picked up early. It also enables component level
> >> acceptance tests. (We tend to build small examples of usage, which in
> >> turn effectively form acceptance tests)
> >>
> >> [ An alternative is to make the "send" primitive make a copy on send.
> >> That would be quite an overhead, and also limit the types of data you
> >> can send. ]
> >>
> >> In practical terms, it works. (Stackless proves this as well IMO,
> >> since despite some differences, there's also lots of similarities)
> >>
> >> The other question that arises, is "isn't the GIL a problem with
> >> threads?". Well, the answer to that really depends on what you're
> >> doing. David Beazely's talk on what happens on mixing different sorts
> >> of threads shows that it isn't ideal, and if you're hitting that
> >> behaviour, then actually switching to real processes makes sense.
> >> However if you're doing CPU intensive work inside a C extension which
> >> releases the GIL (eg numpy), then it's less of an issue in practice.
> >> Custom extensions can do the same.
> >>
> >> So, for example, picking something which I know colleagues [1] at work
> >> do, you can use a DVS broadcast capture card to capture video frames,
> >> pass those between threads which are doing processing on them, and
> >> inside those threads use c extensions to process the data efficiently
> >> (since image processing does take time...), and those release the GIL
> >> boosting throughput.
> >>
> >> [1] On this project :
> >> http://www.bbc.co.uk/rd/projects/2009/10/i3dlive.shtml
> >>
> >> So, that makes it all sound great - ie things can, after various
> >> fashions, run in parallel on various versions of python, to practical
> >> benefit. But obviously it could be improved.
> >>
> >> Personally, I think the project most likely to make a difference here
> >> is actually pypy. Now, talk is very cheap, and easy, and I'm not
> >> likely to implement this, so I'll aim to be brief. Execution is hard.
> >>
> >> In particular, what I think is most likely to be beneficial is
> >> something _like_ this:
> >>
> >> Assume pypy runs without a GIL. Then allow the creation of a green
> >> process. A green process is implemented using threads, but with data
> >> created on the heap such that it defaults to being marked private to
> >> the thread (ie ala thread local storage, but perhaps implemented
> >> slightly differently - via references from the thread local storage
> >> into the heap) rather than shared. Sharing between green processes
> >> (for channels or boxes) would "simply" be detagged as being owned by
> >> one thread, and passed to another.
> >>
> >> In particular this would mean that you need a mechanism for doing
> >> this. Simply attempting to call another green process (or thread) from
> >> another with mutable data types would be sufficient to raise the
> >> equivalent of a segmentation fault.
> >>
> >> Secondly, improve cpyext to the extent that each cpython extension
> >> gets it's own version of the GIL. (ie each extension runs with its own
> >> logical runtime, and thinks that it has its own GIL which it can lock
> >> and release. In practice it's faked by the PyPy runtime. This is
> >> essentially similar conceptually to creating green processes.
> >>
> >> It's worth considering that the Linux kernel went through similar
> >> changes, in that in the 2.0 days there was a large single big lock,
> >> which was replaced by ever granular locks. I personally think that
> >> since there are so many extensions that rely on the existence of the
> >> GIL simply waving a wand to get rid of it isn't likely. However
> >> logically providing a GIL per C-Extension may be plausible, and _may_
> >> be sufficient.
> >>
> >> However, I don't know - it might well not - I've not looked at the
> >> code, and talk is cheap - execution is hard.
> >>
> >> Hopefully the above (cheap :) comments are in some small way useful.
> >>
> >> Regards,
> >>
> >>
> >> Michael.
> >