crawler in python and mysql

Adam Pletcher adam at volition-inc.com
Mon Nov 12 14:36:30 EST 2007


In the standard Python install (Windows 2.5, at least), there's there's a couple example scripts you might find useful:

 

<python>\Tools\webchecker\webchecker.py

Crawls specified URL, checking for broken links.

 

<python>\Tools\webchecker\websucker.py

Variant on the above that archives the specified site locally.  Including images, but you could probably limit it to HTML easily enough.

 

I haven't used either extensively, but they appear to work as advertised.  It should be easy to modify one and tie it into the MySQLdb extensions:

http://sourceforge.net/projects/mysql-python

 

--

Adam Pletcher

Technical Art Director

Volition/THQ <http://www.volition-inc.com/> 

 

From: python-list-bounces+adam=volition-inc.com at python.org [mailto:python-list-bounces+adam=volition-inc.com at python.org] On Behalf Of Fabian López
Sent: Monday, November 12, 2007 12:33 PM
To: Python-list at python.org
Subject: crawler in python and mysql

 

Hi,
I would like to write a code that needs to crawl an url and take all the HTML code. I have noticed that there are different opensource webcrawlers, but they are very extensive for what I need. I only need to crawl an url, and don't know if it is so easy as using an html parser. Is it? Which libraries would you recommend me? 
Thanks!!
Fabian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071112/d6126978/attachment.html>


More information about the Python-list mailing list