cost of creating threads

Mon Oct 4 23:13:09 EDT 2004

Remco Boerma wrote:
 > Bryan Olson wrote:
 >> Remco Boerma wrote:
 >>  > While you have a tiny server, i would recommend a thread-pool
 >>  > of about 2 to 3 threads. This at least cuts down the time
 >>  > needed for thread creation.
 >>
 >> What happens if all three threads are blocked, perhaps waiting
 >> for server_b, when a new request comes in?
 >
 > I guess it would indeed wait until one of the first 3 threads has closed
 > it's connection, or a timeout occurs. But you ought to test it, just to
 > make sure.

I think you are basically right; that is how most thread-pools
work.  I've concluded that statically-sized thread-pools are
usually a mistake.

 > With a little statistics usage, you would be able to create
 > new threads on the fly if many connections are established in a short
 > time, releasing the threads when idle for x seconds.  . This would allow
 > you to use more resources when needed, and releasing them when done.

That sounds like a reasonable idea, but doing it efficiently
could be tricky.  The most straightforward Python implementation
would probably use the locks-with-timeout in Python's
'threading' module.  The way these are implemented, any positive
timeout means the thread actually goes into a sleep-and-poll
loop.  Keeping a pool this way could be significantly less
efficient than creating each thread on demand.

Most descriptions of thread-pools imply that their primary
purpose is to avoid the overhead of creating and destroying
threads.  In fact, thread pools can provide other services, and
the efficiency motivation is dim and fading.  I posted a program
that times creation of thousands of threads, and running it
shows that modern PC-class machines with modern cheap/free OS's
can create a few thousand threads per second. Coding in C speeds
that up by a factor of a several.  If a program needs dozens, or
even hundreds, of threads every second, it can simply create
them as needed.  No sense solving problems we don't have.

A clever operating system can keep a cache of threads in the
background.  The system can probably make better decisions on
when to create a new thread than can the application programmer.
Do I have runable threads, or are they all blocked?  If some
thread calls to create another thread, I should create it if all
the current threads are blocked.  Otherwise, let the runnable
threads run; if they finish then they can then take on the new
tasks.

Sticking with my time-it-and-see approach, I wrote a simple
thread cache to see how much diffence it makes in my thousands-
of-threads timer.  On my PC, it speeds it up by a factor of five
or six.  In my timer, each thread does almost nothing, and the
more work a thread does the less difference creation time makes.
Still there may be cases where thread creation uses a
significant portion of run-time, and the cache does speed things
up.

My thread cache doesn't limit the total number of threads.
Instead, it limits the number waiting in the cache.  When the
client calls for a thread, if one or more threads is in the
cache, it takes one of those; if not, it (tries to) create a new
one.  When a thread is finished with its task, it checks how
many others are waiting in the cache, and decides whether to
exit or cache itself.

For ease of integration, I wrote it as a module that supports
the same interface as the Python library's 'thread' module.  A
program can simply "import thread_cache as thread".  It's new,
so it's not well-tested.

--Bryan

"""
     Module thread_cache.py
     Same interface as Python library's thread module (which
     it uses), but it keeps a cache of threads.
"""

max_nwaiting = 20

from thread import *

_task = None
_mutex = allocate_lock()
_vacant = allocate_lock()
_occupied = allocate_lock()
_occupied.acquire()
_nwaiting = 0
_start = start_new_thread

def _run():
     global _nwaiting, _task
     go = 1
     while go:
         _occupied.acquire()
         _mutex.acquire()
         (func, args, kwargs) = _task
         _task = None
         _mutex.release()
         _vacant.release()
         func(*args, **kwargs)
         #  Thread might exit with an exception, which is fine.
         _mutex.acquire()
         if _nwaiting < max_nwaiting:
             _nwaiting += 1
         else:
             go = 0
         _mutex.release()

def start_new_thread(func, args=(), kwargs={}):
     global _nwaiting, _task
     _vacant.acquire()
     _mutex.acquire()
     if not _nwaiting:
         try:
             _start(_run, (), {})
         except:
             _mutex.release()
             _vacant.release()
             raise
         _nwaiting += 1
     _task = (func, args, kwargs)
     _nwaiting -= 1
     _mutex.release()
     _occupied.release()