Making HTTP requests using Twisted

Tue Jul 11 05:13:41 EDT 2006

rzimerman wrote:
> I'm hoping to write a program that will read any number of urls from
> stdin (1 per line), download them, and process them. So far my script
> (below) works well for small numbers of urls. However, it does not
> scale to more than 200 urls or so, because it issues HTTP requests for
> all of the urls simultaneously, and terminates after 25 seconds.
> Ideally, I'd like this script to download at most 50 pages in parallel,
> and to time out if and only if any HTTP request is not answered in 3
> seconds. What changes do I need to make?
> 
> Is Twisted the best library for me to be using? I do like Twisted, but
> it seems more suited to batch mode operations. Is there some way that I
> could continue registering url requests while the reactor is running?
> Is there a way to specify a time out per page request, rather than for
> a batch of pages requests?

Have a look at pyCurl. (http://pycurl.sourceforge.net)

Regards
Sreeram

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20060711/04df24d9/attachment.sig>