2.6, 3.0, and truly independent intepreters

Fri Oct 24 16:51:10 EDT 2008

Another great post, Glenn!!  Very well laid-out and posed!! Thanks for
taking the time to lay all that out.

>
> Questions for Andy: is the type of work you want to do in independent
> threads mostly pure Python? Or with libraries that you can control to
> some extent? Are those libraries reentrant? Could they be made
> reentrant? How much of the Python standard library would need to be
> available in reentrant mode to provide useful functionality for those
> threads? I think you want PyC
>

I think you've defined everything perfectly, and you're you're of
course correct about my love for for the PyC model.  :^)

Like any software that's meant to be used without restrictions, our
code and frameworks always use a context object pattern so that
there's never and non-const global/shared data).  I would go as far to
say that this is the case with more performance-oriented software than
you may think since it's usually a given for us to have to be parallel
friendly in as many ways as possible.  Perhaps Patrick can back me up
there.

As to what modules are "essential"...  As you point out, once
reentrant module implementations caught on in PyC or hybrid world, I
think we'd start to see real effort to whip them into compliance--
there's just so much to be gained imho.  But to answer the question,
there's the obvious ones (operator, math, etc), string/buffer
processing (string, re), C bridge stuff (struct, array), and OS basics
(time, file system, etc).  Nice-to-haves would be buffer and image
decompression (zlib, libpng, etc), crypto modules, and xml. As far as
I can imagine, I have to believe all of these modules already contain
little, if any, global data, so I have to believe they'd be super easy
to make "PyC happy".  Patrick, what would you see you guys using?

> > That's the rub...  In our case, we're doing image and video
> > manipulation--stuff not good to be messaging from address space to
> > address space.  The same argument holds for numerical processing with
> > large data sets.  The workers handing back huge data sets via
> > messaging isn't very attractive.
>
> In the module multiprocessing environment could you not use shared
> memory, then, for the large shared data items?
>

As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP).  Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something).  Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC.  The former is just WAY less
code, plain and simple.

Andy