[Chicago] Kickstarter Fund to get rid of the GIL

Mon Jul 25 22:42:18 CEST 2011

ACM seems like a good venu. in October, cuz that is the next meeting I
stand a chance of attending.

Or I think there is a High Performance Computing group.  and if not,
maybe I should make one, just so we can all go see Mas talk, and I
would be sure I can make it.

Hmm...  This thread has 3 or 4 potential presentations.

On Mon, Jul 25, 2011 at 1:37 PM, Massimo Di Pierro
<mdipierro at cs.depaul.edu> wrote:
> I can (that is what I worked on most of my life) but it would be a physics
> talk not a python talk therefore not sure if chipy is the appropriate venue.
>
> Massimo
>
> On Jul 25, 2011, at 1:02 PM, Brian Herman wrote:
>
> Whoa massimo can you give a talk on that stuff?
>
>
> Arigatou gozaimasu,
> (Thank you very much)
> Brian Herman
>
> brianjherman.com
> brianherman at acm.org
>
>
>
>
>
>
>
>
> On Mon, Jul 25, 2011 at 9:09 AM, Massimo Di Pierro <mdipierro at cs.depaul.edu>
> wrote:
>>
>> Probably the single largest community in US using BlueGene machines is the
>> lattice QCD community (in fact it was designed in collaboration with the
>> lattice QCD group at Columbia University). The typical data structure
>> consists of a 4D array (called lattice) of a 4-vector of SU(3) matrices (3x3
>> complex double precision) and a few other similar data structures that live
>> on a site of the 4D lattice. A typical lattice has 96x64^3 sites for a total
>> size of 96x64^3x4x9x2x8=14GB. The total memory usage is larger because of
>> copies and other data structures. One of the 4D dimensions is stored locally
>> the other 3D is distributed in parallel. Each iteration of the algorithm
>> involved computing about ~2000 elementary floating point operations per
>> lattice site and communicating the site structure (4x9x2x8bytes) to each of
>> the 2^3 neighbor processors. The most efficient code can use up to 100-1000
>> CPU with a 50-80% efficiency. So if one computing node stores 96 sites it
>> needs to perform ~20
>>  K FLOPs and 8 send and 8 recv for 96x4x9x2x8 bytes each. This type of
>> computations are limited by latency more than bandwidth. Communication is
>> always next neighbor (this is common for all algorithms that solve or are
>> equivalent to solving differential equations numerically).
>>
>>
>>
>> On Jul 25, 2011, at 8:51 AM, sheila miguez wrote:
>>
>> > On Sun, Jul 24, 2011 at 11:51 AM, Alex Gaynor <alex.gaynor at gmail.com>
>> > wrote:
>> >
>> >> I'll live :)  Anyway, the point I was getting is not that a message
>> >> passing
>> >> system is not scalable, I've written code for Blue Gene/Ls so I know
>> >> that
>> >> message passing scales.  But rather that, for problems for which
>> >> shared-memory concurrency is appropriate (read: the valid cases to
>> >> complain
>> >> about the GIL), message passing will not be, because of the
>> >> marshal/unmarshal overhead (plus data size/locality ones).
>> >> ALex
>> >
>> > Are jobs for Blue Gene where there would be fairly sizable data
>> > packets a rare use case? I don't know a lot about the typical use
>> > cases, and am curious. I'm guessing the common case would be where
>> > they would do analysis on things that could be split out, but am
>> > curious about the size of the chunks of information.
>> >
>> >
>> >
>> > --
>> > sheila
>> > _______________________________________________
>> > Chicago mailing list
>> > Chicago at python.org
>> > http://mail.python.org/mailman/listinfo/chicago
>>
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>

-- 
Carl K