Simple thread pools
Josiah Carlson
jcarlson at uci.edu
Mon Nov 8 14:53:18 EST 2004
Steve Holden <steve at holdenweb.com> wrote:
>
> Josiah Carlson wrote:
>
> > Jacob Friis <lists at debpro.webcom.dk> wrote:
> >
> >>I have built a script inspired by a post on Speno's Pythonic Avocado:
> >>http://www.pycs.net/users/0000231/weblog/2004/01/04.html#P10
> >>
> >>I'm setting NUM_FEEDERS to 1000.
> >>Is that crazy?
> >
> >
> > Not crazy, but foolish. Thread scheduling in Python reduces performance
> > beyond a few dozen threads. If you are doing system calls (socket.recv,
> > file.read, etc.), your performance will be poor.
> >
> Is this speculative, or do you have some hard evidence to support it? I
> recently rewrote a billing program that delivers statements by email.
> The number of threads it uses is a parameter to the program, and we are
> currently running at 200 with every evidence of satisfaction - this
> month's live run sent something over 10,000 emails an hour.
There is a slowdown (perhaps 'poor' was a bad description).
>>> i = 1
>>> import os
>>> while i < 256:
... t = os.system('test_thread1.py %i'%i
... i *= 2
...
0.0 8.45300006866 204800000
0.0 7.625 204800000
0.0 9.65600013733 204800000
0.0150001049042 11.2969999313 204800000
0.0159997940063 15.8280000687 204800000
0.0780000686646 16.6719999313 204800000
0.172000169754 17.2029998302 204734464
0.125 18.7189998627 204734464
>>>
Back in the days of Python 2.0, I had written what would now be called a
P2P framework. I initially used blocking threads for communication, and
observed that as my number of connections and threads increased, I saw a
marked reduction in throughput, and an increase in latency (even on a
local machine). In switching to an asynchronous framework (heavily
derived from asyncore), I ended up with a system that had nearly constant
throughput regardless of the number of connections.
> >
> >>Are there a better solution?
> >
> >
> > Fewer threads. Try running at 10-30. If you are finding that you
> > aren't able to handle the load with those threads, then your
> > processor/disk/etc isn't fast enough to handle the load.
> >
> I'm tempted to say "rubbish", but that would be rude, so instead I'll
> just ask for some evidence :-). Don't forget that in network-based tasks
> the time spent waiting for connection turnarounds can dominate the
> elapsed time for execution - did you perhaps overlook that?
Evidence has been provided.
- Josiah
#test_thread1.py
import socket
import time
import threading
import sys
import os
paircount = int(sys.argv[1])
c = threading.Condition()
l = threading.Lock()
ds = 0L
def reader(n, p):
o_r = os.read
global ds
c.acquire()
c.wait()
c.release()
ld = 0
for i in xrange(n):
ld += len(o_r(p, 1024))
l.acquire()
ds += ld
l.release()
s = 1024*'\0'
def writer(n, p):
o_w = os.write
global ds
c.acquire()
c.wait()
c.release()
ld = 0
for i in xrange(n):
ld += o_w(p, s)
l.acquire()
ds += ld
l.release()
count = 100000
blks = count/paircount
for i in xrange(paircount):
r,w = os.pipe()
threading.Thread(target=reader, args=(blks, r)).start()
threading.Thread(target=writer, args=(blks, w)).start()
time.sleep(1)
t = time.time()
c.acquire()
c.notifyAll()
c.release()
print time.time()-t,
t = time.time()
while len(threading.enumerate()) > 1:
time.sleep(.05)
print time.time()-t, ds
More information about the Python-list
mailing list