pagecrawling websites with Python

writeson writeson at charter.net
Fri Apr 1 14:58:11 EST 2005


Hi all,

We've got an application we wrote in Python called pagecrawler that
generates a list of URL's based on sql queries. It then runs through
this list of URL's 'browsing' one of our staging servers for all those
URL's. We do this to build the site dynamically, but each page
generated by the URL is saved as a static HTML file. Anyway, the
pagecrawler program uses Python threads to try and build the pages as
fast as it can. The list of URL's is stored in a queue and the thread
objects get URL's from the queue and run them till the queue is empty.
This works okay but it still seems to take a long time to build the
site this way, even though the actual pages only take milliseconds to
run (the pages are generated with PHP on separate server). Does anyone
have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

Thanks in advance,
Doug




More information about the Python-list mailing list