[Chicago] Best practices for this concurrent problem?

Daniel Griffin dgriff1 at gmail.com
Tue Sep 15 05:49:28 CEST 2009


Well, processes and threads are very similar on Unixey OSs but fork makes a
copy of your current process and then certain things might break(like DB
connections). Python threads or "real" threads except that only 1 thread can
use the interpreter at a time.
I wrote a test app using multiprocessing pools and I think it will be the
best I can hope for, I can start say 8 processes and use them in a pool.
There is still the chance I get caught up on  8 long running processes but I
think I can risk that.

Dan

On Mon, Sep 14, 2009 at 10:38 PM, Martin Maney <maney at two14.net> wrote:

> On Sun, Sep 13, 2009 at 01:51:02PM -0500, Daniel Griffin wrote:
> > I have been playing with a program which essentially reads some elements
> > from a database then starts a "thread" for each element, this can be a
> few
> > elements or thousands. Inside that thread it does some processing then
> > starts n sockets to do tasks. The life of the thread is generally a few
> > seconds but can stretch up to hours or even days.
>
> > threads - GIL death when I spin up a ton of threads, I can limit how many
> > threads I make but I know I am not getting anywhere near full utilization
> of
> > the box.
>
> Yes, native Python threads don't use multiple processors.  That's why
> Tornado's setup was to run 4 instances of Torndao (on a quad core box)
> behind nginx running to proxy the incoming requests to those worker
> processes round robin.  Something like that is probably a reasonable
> approach for Python.
>
> > processes - I dont think its a good idea to make processes that are short
> > lived, it seems too expensive. This is even worse on windows.
>
> Linux's forking is very fast - in fact forking a new process and
> starting a "real" thread are the same process with a few options set
> differently IIRC.  OTOH, I believe Python's threading is the user-mode
> sort, which should be even quicker to start, but with more limitations
> (such as the often-complained about way Python's threading doesn't
> scale to multiple cores).  Windows is, indeed, a whole different story,
> and if the codebase has to be performant on both Linux and Windows you
> may well need to provide different modes of operation.
>
> > async - I would have to re-factor to make this work and havent tried yet.
>
> Async is the state machine approach.  It can yield very good
> performance, but its Achille's heel is that it doesn't deal well with
> "long" computations even if they're rare (they cause latency for all
> pending requests or an often painful refactoring to break the work into
> small chunks).  That's the design choice twisted and Tornado make.
>
> > summary - What is the best way to deal with (sometimes) large numbers of
> > "threads" that do a small amount of processing and a large amount of
> socket
> > io?
>
> There is no best way, there are only a choice of design tradeoffs.
> >From where you are now, the simplest might be something like Tornado's
> deployment model - run a fairly small number of threaded workers with a
> work manager that dispatches tasks (data sets or queries or whatever
> exactly they are) to these workers.  That should scale at least to
> around one worker per core; maybe further if the thread bog-down is due
> to internal overhead in Python's threading.
>
> --
> Beer does more than Pascal can to justify Date's ways to man.
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20090914/765ec0ad/attachment.htm>


More information about the Chicago mailing list