Async Client with 1K connections?
William Chang
williamichang at hotmail.com
Wed Feb 11 02:12:31 EST 2004
Thank you all for the discussion! Some additional information:
One of the intended uses is indeed a next-gen web spider. I did the
math, and yes I will need about 10 cutting-edge PCs to spider like
you-know-who. But I shouldn't need 100 -- and would rather not
spend money unnecessarily... Throughput per PC would be on
the order of 1MB/s assuming 200x5KB downloads/sec using 1-2000
simultaneous connections. (That's 17M pages per day per PC.)
My search & content engine can index and store at such a rate,
but can the spider initiate (at least) 200 new requests per second,
assuming each request lasts 5-10 seconds?
Of course, that assumes the spider algorithm/coordinator is pretty
intelligent and well-engineered. And the hardware stay up, etc.
Managing storage is certainly nontrivial; at such a scale nothing is
to be taken for granted!
Nevertheless, it shouldn't cost millions. Maybe $100K :-)
Time for a sanity check? --William
More information about the Python-list
mailing list