Concurrent threads to pull web pages?

Thu Oct 1 21:48:06 EDT 2009

On 01:36 am, kyle at kyleterry.com wrote:
>On Thu, Oct 1, 2009 at 6:33 PM, <exarkun at twistedmatrix.com> wrote:
>>On 1 Oct, 09:28 am, nospam at nospam.com wrote:
>>>Hello
>>>
>>>        I recently asked how to pull companies' ID from an SQLite 
>>>database,
>>>have multiple instances of a Python script download each company's 
>>>web
>>>page from a remote server, eg. www.acme.com/company.php?id=1, and use
>>>regexes to extract some information from each page.
>>>
>>>I need to run multiple instances to save time, since each page takes
>>>about 10 seconds to be returned to the script/browser.
>>>
>>>Since I've never written a multi-threaded Python script before, to
>>>save time investigating, I was wondering if someone already had a
>>>script that downloads web pages and save some information into a
>>>database.
>>
>>There's no need to use threads for this.  Have a look at Twisted:
>>
>>  http://twistedmatrix.com/trac/
>>
>>Here's an example of how to use the Twisted HTTP client:
>>
>>http://twistedmatrix.com/projects/web/documentation/examples/getpage.py
>
>I don't think he was looking for a framework... Specifically a 
>framework
>that you work on.

He's free to use anything he likes.  I'm offering an option he may not 
have been aware of before.  It's okay.  It's great to have options.

Jean-Paul