crawler in python and mysql

Mon Nov 12 14:36:30 EST 2007

In the standard Python install (Windows 2.5, at least), there's there's a couple example scripts you might find useful:

<python>\Tools\webchecker\webchecker.py

Crawls specified URL, checking for broken links.

<python>\Tools\webchecker\websucker.py

Variant on the above that archives the specified site locally.  Including images, but you could probably limit it to HTML easily enough.

I haven't used either extensively, but they appear to work as advertised.  It should be easy to modify one and tie it into the MySQLdb extensions:

http://sourceforge.net/projects/mysql-python

--

Adam Pletcher

Technical Art Director

Volition/THQ <http://www.volition-inc.com/> 

From: python-list-bounces+adam=volition-inc.com at python.org [mailto:python-list-bounces+adam=volition-inc.com at python.org] On Behalf Of Fabian López
Sent: Monday, November 12, 2007 12:33 PM
To: Python-list at python.org
Subject: crawler in python and mysql

Hi,
I would like to write a code that needs to crawl an url and take all the HTML code. I have noticed that there are different opensource webcrawlers, but they are very extensive for what I need. I only need to crawl an url, and don't know if it is so easy as using an html parser. Is it? Which libraries would you recommend me? 
Thanks!!
Fabian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071112/d6126978/attachment.html>