Robot

Eric @ Zomething eric at zomething.com
Fri Jun 18 21:33:40 EDT 2004


export at hope.cz wrote:

> Does anyone know about a script that can walk through webpages and
> extract an information from these web sites according to given
> keyword(s)?
> Thanks for reply

Python is a high level language well suited for this task, however the task is not terribly simply and likely you will have to write your own program to do what you want.

Several aspects of this are (1) spidering - accumulating the URL's you want to visit as you load pages (2) web page fetching - which is easy: 

import urllib
captured_page = urllib.urlopen("http://somewebpage.com").read()

and (3) figuring how to identify what you want to extract from each page.  This can be non-trivial.  The web is not so semantic after all.  Yet, at least.

Good luck and feel free to share your solutions.




Eric Pederson
http://www.songzilla.blogspot.com
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
e-mail me at:
do at something.com
except, increment the "d" and "o" by one letter
and spell something with a "z"
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::




More information about the Python-list mailing list