Simple thread pools

Steve Holden steve at holdenweb.com
Tue Nov 9 08:53:45 EST 2004


Josiah Carlson wrote:

> Steve Holden <steve at holdenweb.com> wrote:
> 
> [snip prior portions of the conversation as it was getting long]
> 
> 
>>It's not particularly surprising that communicating the same amount of 
>>information across more threads (and pipelines) on the same machine 
>>shows the thread-management activity starting to become significant.
>>
>>However, in the case where I'm trying to send customer statements out by 
>>email I still maintain that it's quicker (i.e. a given number of mails 
>>will be sent out in less elapsed time) to have 200 threads running in 
>>parallel (each typically communicating with a separate mail server) than 
>>it is to use (say) 30 threads.
>>
>>While I agree that overall I may end up using more local CPU, I'm happy 
>>to use it because it means I can send over 10,000 emails an hour. Are 
>>you suggesting it would go more quickly with fewer threads? This 
>>certainly contradicts my testing results.
>>
>>Although your program imports the socket library it doesn't appear to 
>>use it, so I remain unconvinced of what you say. I do accept that we may 
>>be talking at cross purposes, however, since I'm unable to get 
>>www.pycs.net to respond and show me the original code on which the OP's 
>>question was based.
> 
> 
> I had initially planned to create a listening socket, and generate a
> bunch of local sockets, then I remembered os.pipe and said to myself, "to
> hell with it, pipes should be faster, they bypass the network stack".
> 
Of course the example is then all running on the one machine, which 
might influence elapsed times in a way that having servers remote 
wouldn't, but it was a neat example just the same.

> 
> As they sometimes say, "there is more than one way to skin a cat",
> though let us hope that there isn't any cat skinning.
> 
> If your processor spends time maxed out by your script, then you may do
> better by reducing threads (processor limited, and not bandwidth/latency
> limited).  As thread count increases, you spend more processor handling
> overhead.  If it isn't maxed out, and you are running at the file handle
> limit and/or the bandwidth limit, congrats.
> 
Well I'm happy to say I don't appear to be running at *any* limit just 
at the moment. But clearly when you are CPU-limited then there's a 
balance to be struck between thread-handling overhead and applications 
processing.
> 
> Now, just because you are using fewer threads, doesn't mean that you
> can't get equivalent throughput.  Heck, using a heavily modified variant
> of asyncore, we've been able to handle 50,000 POP3 account checks (login,
> stat, and if necessary: list, uidl, download email, delete email,
> disconnect) every 15 minutes from a laptop.  Our biggest issue is
> latency of our connection, but even then, we do well considering that
> this is all with a single thread.
> 
The owner of that laptop needs to be given some work. Even *I* don't 
check my mail every second :-)

I see we were indeed at cross purposes. The load I'm talking about is 
bursty in the extreme - I am communicating with the MX hosts for each of 
the receiving domains, and it takes less than 100ms to send some mails.

For others I have to try several MX hosts to get any response at all 
(each one with a 20-second timeout), and the reason I went to the 
multi-threaded solution int he first place was to avoid having the whole 
process wait on a single recalcitrant MX host.

The client is very happy, as we've seen in increase in reliability and a 
48-fold speed up in elapsed time as a result of the (rather painful) 
transition to multi-threading. This is clearly justification for using 
200 parallel threads, as on many threads the dominant factor in the 
elapsed time is the remote server response (or absence thereof).

The cases you are taking about may well require a more careful 
consideration of threading overhead.

regards
  Steve
-- 
http://www.holdenweb.com
http://pydish.holdenweb.com
Holden Web LLC +1 800 494 3119



More information about the Python-list mailing list