Simple thread pools

Dave Brueck dave at pythonapocrypha.com
Fri Nov 5 10:54:44 EST 2004


Jacob Friis wrote:
>> Things as maximum number of file descriptors and generally speaking IO
>> operations and their limits are matters of the underlying OS, not of the
>> programming language itself. So show us a way in another language, and 
>> then
>> we can tell you how to do that in python.
> 
> 
> I need to download 150000 files several times every day.
> How would you solve that?

Best bet is to use an asynchronous socket library (asyncore / asynchat / twisted 
/ etc). On Linux it's fairly easy to increase the number of file descriptors 
allowed per process, and if you manage your connections appropriately you can 
have thousands of simultaneous open connections.

Elsewhere I noticed that you said the average object size is only around 15k, so 
  once you have the basic system working, you can probably improve performance 
by focusing on things that lower transaction overhead - DNS caching, reusing 
connections to servers (and pipelining HTTP requests to those servers), etc., 
but I wouldn't bother with that right away - better to get the core of the 
stable and scalable first.

HTH,
-Dave



More information about the Python-list mailing list