Help extracting info from HTML source ..
Nikita the Spider
NikitaTheSpider at gmail.com
Fri Jan 26 13:59:46 EST 2007
In article <1169819118.201093.267320 at h3g2000cwc.googlegroups.com>,
"Miki" <miki.tebeka at gmail.com> wrote:
> Hello Shelton,
>
> > I am learning Python, and have never worked with HTML. However, I would
> > like to write a simple script to audit my 100+ Netware servers via their web
> > portal.
> Always use the right tool, BeautilfulSoup
> (http://www.crummy.com/software/BeautifulSoup/) is best for web
> scraping (IMO).
>
> from urllib import urlopen
> from BeautifulSoup import BeautifulSoup
>
> html = urlopen("http://www.python.org").read()
> soup = BeautifulSoup(html)
> for link in soup("a"):
> print link["href"], "-->", link.contents
Agreed. HTML scraping is really complicated once you get into it. It
might be interesting to write such a library just for your own
satisfaction, but if you want to get something done then use a module
that already written, like BeautifulSoup. Another module that will do
the same job but works differently (and more simply, IMO) is HTMLData by
Connelly Barnes:
http://oregonstate.edu/~barnesc/htmldata/
--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
More information about the Python-list
mailing list