Async Client with 1K connections?

William Chang williamichang at hotmail.com
Wed Feb 11 02:12:31 EST 2004


Thank you all for the discussion!  Some additional information:

One of the intended uses is indeed a next-gen web spider.  I did the
math, and yes I will need about 10 cutting-edge PCs to spider like
you-know-who.  But I shouldn't need 100 -- and would rather not
spend money unnecessarily...  Throughput per PC would be on
the order of 1MB/s assuming 200x5KB downloads/sec using 1-2000
simultaneous connections.  (That's 17M pages per day per PC.)
My search & content engine can index and store at such a rate,
but can the spider initiate (at least) 200 new requests per second,
assuming each request lasts 5-10 seconds?

Of course, that assumes the spider algorithm/coordinator is pretty
intelligent and well-engineered.  And the hardware stay up, etc.
Managing storage is certainly nontrivial; at such a scale nothing is
to be taken for granted!

Nevertheless, it shouldn't cost millions.  Maybe $100K :-)

Time for a sanity check?  --William







More information about the Python-list mailing list