[Chicago] Kickstarter Fund to get rid of the GIL

Mon Jul 25 20:49:09 CEST 2011

Help me understand how much data is being communicated among nodes
during computations. A data structure representing a lattice can get
up to 14GB, but how much of that is passed among nodes?

I work on things that pass a lot of messages around, and I've seen 8MB
messages on the tail end of what I work with. :)

(I'm sure there are some bigger messages elsewhere in the company, I'm
only talking about things I've directly worked with)

I'm not trying to derail, I'm asking because I want to get an idea of
where people would think passing messages becomes limiting with
respect to the size of the messages.

On Mon, Jul 25, 2011 at 9:09 AM, Massimo Di Pierro
<mdipierro at cs.depaul.edu> wrote:
> Probably the single largest community in US using BlueGene machines is the lattice QCD community (in fact it was designed in collaboration with the lattice QCD group at Columbia University). The typical data structure consists of a 4D array (called lattice) of a 4-vector of SU(3) matrices (3x3 complex double precision) and a few other similar data structures that live on a site of the 4D lattice. A typical lattice has 96x64^3 sites for a total size of 96x64^3x4x9x2x8=14GB. The total memory usage is larger because of copies and other data structures. One of the 4D dimensions is stored locally the other 3D is distributed in parallel. Each iteration of the algorithm involved computing about ~2000 elementary floating point operations per lattice site and communicating the site structure (4x9x2x8bytes) to each of the 2^3 neighbor processors. The most efficient code can use up to 100-1000 CPU with a 50-80% efficiency. So if one computing node stores 96 sites it needs to perform ~20
>  K FLOPs and 8 send and 8 recv for 96x4x9x2x8 bytes each. This type of computations are limited by latency more than bandwidth. Communication is always next neighbor (this is common for all algorithms that solve or are equivalent to solving differential equations numerically).
>
>
>
> On Jul 25, 2011, at 8:51 AM, sheila miguez wrote:
>
>> On Sun, Jul 24, 2011 at 11:51 AM, Alex Gaynor <alex.gaynor at gmail.com> wrote:
>>
>>> I'll live :)  Anyway, the point I was getting is not that a message passing
>>> system is not scalable, I've written code for Blue Gene/Ls so I know that
>>> message passing scales.  But rather that, for problems for which
>>> shared-memory concurrency is appropriate (read: the valid cases to complain
>>> about the GIL), message passing will not be, because of the
>>> marshal/unmarshal overhead (plus data size/locality ones).
>>> ALex
>>
>> Are jobs for Blue Gene where there would be fairly sizable data
>> packets a rare use case? I don't know a lot about the typical use
>> cases, and am curious. I'm guessing the common case would be where
>> they would do analysis on things that could be split out, but am
>> curious about the size of the chunks of information.
>>
>>
>>
>> --
>> sheila
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>

-- 
sheila