practical limits of urlopen()

Tue Jan 27 05:37:59 EST 2009

On Sat, 24 Jan 2009 09:17:10 -0800, webcomm wrote:

> Hi,
> 
> Am I going to have problems if I use urlopen() in a loop to get data
> from 3000+ URLs?  There will be about 2KB of data on average at each
> URL.  I will probably run the script about twice per day.  Data from
> each URL will be saved to my database.
> 
> I'm asking because I've never opened that many URLs before in a loop.
> I'm just wondering if it will be particularly taxing for my server. Is
> it very uncommon to get data from so many URLs in a script?  I guess
> search spiders do it, so I should be able to as well?

urllib doesn't have any limits, what might limit your program is your 
connection speed and the hardware where the server and downloader is on. 
Getting 3000 URLs is about 6MBs, a piece of cake for a sufficiently 
modern machine on a decent internet connection (the real calculation 
isn't that simple though, there is also some cost associated with sending 
and processing HTML headers).

Google indexes millions of pages per day, but they also have one of the 
most advanced server farm in the world.