Making HTTP requests using Twisted
Manlio Perillo
manlio_perilloNO at SPAMlibero.it
Tue Jul 11 13:53:20 EDT 2006
rzimerman ha scritto:
> I'm hoping to write a program that will read any number of urls from
> stdin (1 per line), download them, and process them. So far my script
> (below) works well for small numbers of urls. However, it does not
> scale to more than 200 urls or so, because it issues HTTP requests for
> all of the urls simultaneously, and terminates after 25 seconds.
> Ideally, I'd like this script to download at most 50 pages in parallel,
> and to time out if and only if any HTTP request is not answered in 3
> seconds. What changes do I need to make?
>
Take a look at
http://svn.twistedmatrix.com/cvs/trunk/doc/core/examples/stdiodemo.py?view=markup&rev=15456
And read
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html
You can pass a timeout to the constructor.
To download at most 50 pages in parallel you can use a download queue.
Here is a quick example, ABSOLUTELY NOT TESTED:
class DownloadQueue(object):
SIZE = 50
def init(self):
self.requests = [] # queued requests
self.deferreds = [] # waiting requests
def addRequest(self, url, timeout):
if len(self.deferreds) >= sels.SIZE:
# wait for completion of all previous requests
DeferredList(self.deferreds
).addCallback(self._callback)
self.deferreds = []
# queue the request
deferred = Deferred()
self.requests.append((url, timeout, deferred))
return deferred
else:
# execute the request now
deferred = getPage(url, timeout=timeout)
self.deferreds.append(deferred)
return deferred
def _callback(self):
if len(self.requests) > self.SIZE:
queue = self.requests[:self.SIZE]
self.requests = self.requests[self.SIZE:]
else:
queue = self.requests[:]
self.requests = []
# execute the requests
for (url, timeout, deferredHelper) in queue:
deferred = getPage(url, timeout=timeout)
self.deferreds.append(deferred)
deferred.chainDeferred(deferredHelper)
Regards Manlio Perillo
More information about the Python-list
mailing list