Async Client with 1K connections?

William Chang williamichang at hotmail.com
Fri Feb 13 03:33:12 EST 2004


Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
> "William Chang" <williamichang at hotmail.com> writes:
> > ...  Throughput per PC would be on
> > the order of 1MB/s assuming 200x5KB downloads/sec using 1-2000
> > simultaneous connections.  (That's 17M pages per day per PC.)
> 
> That's orders of magnitude less than you-know-who.  

Do you know how frequently you-know-who refreshes its entire index?  A year
ago things were pretty dire, easily over 10% dead links, if I recall correctly.
10 PCs at 17M/day each will refresh 3B pages in 18 days, easily world-class.

> ... Also, don't forget
> how many queries you have to take from users, and the amount of disk seeks
> needed for each one.

Sure, that's what I do.  However, spidering and querying are independent tasks,
generally speaking.

> 10 MB of internet connectivity is at least a few K$/month all by itself.

Yes, $2500 to be specific.

There's no reason to be intimidated (if I may use that word) by you-know-who's
marketing message (80,000 machines).  Back in '96 Infoseek could handle 10M
queries per day on a single Sun E4000 with 8CPU (<200Mhz), 4GB, 20x4GB RAID.
Sure the WWW is much bigger now, but so are the disk drives!

-- William



More information about the Python-list mailing list