What's the cost of using hundreds of threads?

Wed Mar 2 06:28:30 EST 2005

> I'm a bit confused by your math.  Fifty connections should be 102
> threads, which is quite reasonable.

My formula applies to one forwarded ('loadbalanced') connection. Every 
such connection creates further n connections (pipes) which share the 
load. Every pipe requires two threads to be spawned. Every 'main 
connection' spawns two other threads - so my formula: 2*pipes+2 gives 
the number of threads spawned per 'main connection'.

Now if connections_count connections are established the thread count 
equals:
conn_count * threads_per_main_connection = conn_count * (2*pipes+2)

For 50 connections and about 10 pipes it will give 1100 threads.

> My experience with lots of threads dates back to Python 1.5.2, but I
> rarely saw much improvement with more than a hundred threads, even for
> heavily I/O-bound applications on a multi-CPU system.  However, if your
> focus is algorithmic complexity, you should be able to handle a couple of
> thousand threads easily enough.

I don't spawn them because of computional reasons, but due to the fact 
that it makes my code much more simpler. I use built-in tcp features to 
achieve loadbalancing - every flow (directed through pipe) has it's own 
dedicated threads - separate for down- and upload. For every 'main 
connection' these threads share send and receive buffer. If any of pipes 
is congested the corresponding threads block on their send / recv 
functions - without affecting independence of data flows.

Using threads gives me VERY simple code. To achieve this with poll / 
select would be much more difficult. And to guarantee concurrency and 
maximal throughput for all of pipes I would probably have to mirror code 
  from linux TCP stack (I mean window shifting, data acknowlegement, 
retransmission queues). Or perhaps I exaggerate.