Help me use my Dual Core CPU!

Thu Sep 28 15:43:27 EDT 2006

Paul Rubin wrote:

> "Michael Sparks" <sparks.m at gmail.com> writes:
>> > Kamaelia doesn't attempt concurrency at all.  Its main idea is to use
>> > generators to simulate microthreads.
>> 
>> Regarding Kamaelia, that's not been the case for over a year now.
>> 
>> We've had threaded components as well as generator based ones since
>> around last July, however their API stablised properly about 4 months
>> back. If you use C extensions that release the GIL and are using an OS
>> that puts threads on different CPUs then you have genuine concurrency.
>> (those are albeit some big caveats, but not uncommon ones in python).
> 
> Oh neat, this is good to hear.

:) 

Ironically it was worth mentioning because we made a number of optimisations
earlier in the year specifically to make it such that CPU usage of Kamaelia
systems was much lower generally speaking to allow us to take advantage of
multiple CPUs where available for an internal project described here:
   * http://kamaelia.sourceforge.net/KamaeliaMacro.html

A "look, but can't get" front end here:

   * http://bbc.kamaelia.org/cgi-bin/blog/blog.cgi

Code here:

http://svn.sourceforge.net/viewvc/kamaelia/trunk/Code/Python/Kamaelia/Examples/DVB_Systems/Macro.py?view=markup

If there were (say) 4 CPUs or cores on that system, then the graphline at
the end could become:

Graphline(
     SOURCE=DVB_Multiplex(freq, pids["NEWS24"] + 
                                pids["BBC ONE"] + 
                                pids["CBBC"] + 
                                pids["BBC TWO"]+pids["EIT"],
                          feparams),
     DEMUX=DVB_Demuxer({
         600: ["BBCONE"],
         601: ["BBCONE"],
         610: ["BBCTWO"],
         611: ["BBCTWO"],
         620: ["CBBC"],
         621: ["CBBC"],
         640: ["NEWS24"],
         641: ["NEWS24"],
         18: ["BBCONE","BBCTWO", "CBBC","NEWS24"],
     }),
     NEWS24 = ChannelTranscoder(service_ids["NEWS24"], **params["HI"]),
     BBCONE = ChannelTranscoder(service_ids["BBC ONE"], **params["HI"]),
     BBCTWO = ChannelTranscoder(service_ids["BBC TWO"], **params["HI"]),
     CBBC = ChannelTranscoder(service_ids["CBBC"], **params["HI"]),
     linkages={
       ("SOURCE", "outbox"):("DEMUX","inbox"),
       ("DEMUX", "NEWS24"): ("NEWS24", "inbox"),
       ("DEMUX", "BBCONE"): ("BBCONE", "inbox"),
       ("DEMUX", "BBCTWO"): ("BBCTWO", "inbox"),
       ("DEMUX", "CBBC"): ("CBBC", "inbox"),
     }
).run()

And that would naturally take advantage of all 4 CPUs.

Admittedly this is in a limited scenario right now, and is the exception not
the rule, but does give an idea of where we'd like to end up, even if at
the moment, like most people's machines, we default to making the most of a
single CPU :-)

So, whilst we don't automatically parallelise your code, or magically run
across multiple CPUs, and whether that happens really depends on how
practical it is. (I must admit I suspect it is doable though, and will be
something worth addressing, and I'll look at it at some point if no-one
else does :-)

At the moment this means components explicitly working that way (such as the
above ones do by the way the transcoder works). However I suspect explicit
parallelisation or hinting for parallelisation (eg via baseclass, perhaps
metaclass) will be doable and can be intuitive and practical :-) I might
be alone in believing that of course :-)

>> Personally, I'm very much in the camp that says "shared data is
>> invariably a bad idea unless you really know what you're doing"
>> (largely because it's the most common source of bugs for people where
>> they're trying to do more than one thing at a time). People also
>> generally appear to find writing threadsafe code very hard. (not
>> everyone, just the people who aren't at the top end of the bell curve
>> for writing code that does more than one thing at a time)
> 
> I don't think it's that bad.  Yes, free-threaded programs synchronized
> by seat-of-the-pants locking turns to a mess pretty quickly. 

That's an agreement with my point (or rather I'm agreeing

> But ordinary programmers write real-world applications with shared data
> all the time, namely database apps.  

I don't call that shared data because access to the shared data is
arbitrated by a third party - namely the database. I mean where 2 or more
people[*] hold a lock on an object and share it - specifically the kind of
thing you reference above as turning into a mess.

[*] Sorry, I have a nasty habit of thinking of software as little robots or
    people. I blame usborne books of the early 80s for that :-)

> This is just silly, and wasteful of the
> efforts of the hardworking chip designers who put that nice cache
> coherence circuitry into our CPU's, to mediate shared data access at
> the sub-instruction level so we don't need all that IPC hair.

Aside from the fact it's enabled millions of programmers to deal with
shared data by communicating with a database?

> Basically if the hardware gods have blessed us with concurrent cpu's
> sharing memory, it's our nerdly duty to figure out how to use it. 

It's also our duty to figure out to make it easier for the bulk of
programmers though. After all, that's the point of an operating system or
programming language in many respects, or can be the aim of a library :-)
(your mileage may well vary)

Incidentally, it's worth noting that for the bulk of components (not all)
we don't do copying between things anymore. We used to before doing
optimisations earlier this year because we took the simplest, most
literal implementation of the metaphor as the starting point (a postman
picking up messages from an outbox, walking to another desk/component
and delivering to the inbox).

For generator based components we collapse inboxes into outboxes which means
all that's happening when someone puts a piece of data into an outbox,
they're simply saying "I'm no longer going to use this", and the recipient
can use it straight away.

This is traditional-lock free, and the at the same time encourages safety
because of the real world metaphor - once I post something through a letter
box, I can't do anything more with it. This also has natural performance
benefits. Sure people can break the rules and odd things can happen, but
the metaphor encourages people not to do that.

(For thread based components Queue.Queues are used)

> We need better abstractions than raw locks, but this is hardly new.
> Assembly language programmers made integer-vs-pointer aliasing or
> similar type errors all the time, so we got compiled languages with
> type consistency enforcement.  Goto statements turned ancient Fortran
> code into spaghetti, so we got languages with better control
> structures.  We found memory allocation bookkeeping to do by hand in
> complex programs, so we use garbage collection now.

I think you're essentially agreeing in principle here or I'm agreeing with
you in principle.

> And to deal with 
> shared data, transactional databases have been a very effective tool
> despite all the inefficiency mentioned above.

I think I'm still agreeing here :-)

> Lately I've been reading about "software transactional memory" (STM),
> a scheme for treating shared memory as if it were a database, without
> using locks except for during updates.  In some versions, STM
> transactions are composable, so nowhere near as bug-prone as
> fine-grained locks; and because readers don't need locks (they instead
> have to abort and restart transactions in the rare event of a
> simultaneous update) STM actually performs -faster- than traditional
> locking.  I posted a couple of URL's in another thread and will try
> writing a more detailed post sometime.  It is pretty neat stuff.
> There are some C libraries for it that it might be possible to port to
> Python.

I've been hearing about it as well, but not digged into it. If the promises
hold out as people hope, I'd hope to add in support into Kamaelia if it
makes any sense (probably would because the co-ordinating assistant tracker
is a potential location this would be helpul for). 

Whilst we're a component system, my personal aim is to make it easier for
people to write maintainable software that uses concurrency naturally
because its easier. OK, that might be mad, but if the side effect is trying
things that might make people's lives easier I can live with that :-)

Interestingly, because of some of the discussions in this thread I took a
look at Erlang. Probably due to it's similarities, in passing, to occam,
there's some stark similarities to Kamaelia there as well - mailboxes, the
ability for lightweight threads to hibernate, that sort of thing.

The difference really is that I'm not claiming this is special to the
particular language (we've got a proof of concept in C++ after all), and
we're aiming to use metaphors that are accessible, along with a bunch of
code as a proof of concept that we're finding useful. Whilst this is *soo*
the wrong place to say it, I'd really like to see a C++ or Java version
simply to see what has to change in a statically typed environment.

I suppose the other thing is that I'm not saying we're right, just that
we're finding it useful, and hoping others do too :-) (a couple of years
ago it was "we don't know if it works, but it might", now at least I'm at
the stage "This works, at least for us and a few others, it might for you",
which in my mind quite a jump :)

If you do dig out those STM references, I'd be interested :-)

Regards,

Michael.