webspider, regexp not working, why?
alex23
wuwei23 at gmail.com
Fri May 23 22:39:19 EDT 2008
On May 24, 3:26 am, "Reedick, Andrew" <jr9... at ATT.COM> wrote:
> c) If you're going to parse html/xml then bite the bullet and learn one
> of the libraries specifically designed to parse html/xml. Many other
> regex gurus have learned this lesson. Myself included. =)
Agreed. The BeautifulSoup approach is particularly nice (although not
part of stdlib):
>>> import urllib
>>> from BeautifulSoup import BeautifulSoup
>>> html = urllib.urlopen('http://www.python.org/').read()
>>> soup = BeautifulSoup(html)
>>> links = [link['href'] for link in soup('link')]
>>> links[0]
u'http://www.python.org/channews.rdf'
- alex23
More information about the Python-list
mailing list