[Python-Dev] Yet another "A better story for multi-core Python" comment

Gary Robinson garyrob at me.com
Wed Sep 9 22:52:39 CEST 2015


I’m going to seriously consider installing Windows or using a dedicated hosted windows box next time I have this problem so that I can try your solution. It does seem pretty ideal, although the STM branch of PyPy (using http://codespeak.net/execnet/ to access SciPy) might also work at this point.

Thanks!

I still hope CPython has a solution at some point… maybe PyParallelel functionality will be integrated into Python 4 circa 2023… :)



-- 

Gary Robinson
garyrob at me.com
http://www.garyrobinson.net

> On Sep 9, 2015, at 4:33 PM, Trent Nelson <trent at snakebite.org> wrote:
> 
> On Tue, Sep 08, 2015 at 10:12:37AM -0400, Gary Robinson wrote:
>> There was a huge data structure that all the analysis needed to
>> access. Using a database would have slowed things down too much.
>> Ideally, I needed to access this same structure from many cores at
>> once. On a Power8 system, for example, with its larger number of
>> cores, performance may well have been good enough for production. In
>> any case, my experimentation and prototyping would have gone more
>> quickly with more cores.
>> 
>> But this data structure was simply too big. Replicating it in
>> different processes used memory far too quickly and was the limiting
>> factor on the number of cores I could use. (I could fork with the big
>> data structure already in memory, but copy-on-write issues due to
>> reference counting caused multiple copies to exist anyway.)
> 
> This problem is *exactly* the type of thing that PyParallel excels at,
> just FYI.  PyParallel can load large, complex data structures now, and
> then access them freely from within multiple threads.  I'd recommended
> taking a look at the "instantaneous Wikipedia search server" example as
> a start:
> 
> https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py
> 
> That loads trie with 27 million entries, creates ~27.1 million
> PyObjects, loads a huge NumPy array, and has a WSS of ~11GB.  I've
> actually got a new version in development that loads 6 tries of the
> most frequent terms for character lengths 1-6.  Once everything is
> loaded, the data structures can be accessed for free in parallel
> threads.
> 
> There are more details regarding how this is achieved on the landing
> page:
> 
> https://github.com/pyparallel/pyparallel
> 
> I've done a couple of consultancy projects now that were very data
> science oriented (with huge data sets), so I really gained an
> appreciation for how common the situation you describe is.  It is
> probably the best demonstration of PyParallel's strengths.
> 
>> Gary Robinson garyrob at me.com http://www.garyrobinson.net
> 
>    Trent.



More information about the Python-Dev mailing list