html parser?
Thorsten Kampe
thorsten at thorstenkampe.de
Tue Oct 18 12:08:54 EDT 2005
* Christoph Söllner (2005-10-18 12:20 +0100)
> right, that's what I was looking for. Thanks very much.
For simple things like that "BeautifulSoup" might be overkill.
import formatter, \
htmllib, \
urllib
url = 'http://python.org'
htmlp = htmllib.HTMLParser(formatter.NullFormatter())
htmlp.feed(urllib.urlopen(url).read())
htmlp.close()
print htmlp.anchorlist
and then use urlparse to parse the links/urls...
More information about the Python-list
mailing list