html parser?

Thorsten Kampe thorsten at thorstenkampe.de
Tue Oct 18 12:08:54 EDT 2005


* Christoph Söllner (2005-10-18 12:20 +0100)
> right, that's what I was looking for. Thanks very much.

For simple things like that "BeautifulSoup" might be overkill.

import formatter, \ 
       htmllib,   \ 
       urllib 

url = 'http://python.org' 

htmlp = htmllib.HTMLParser(formatter.NullFormatter()) 
htmlp.feed(urllib.urlopen(url).read()) 
htmlp.close() 

print htmlp.anchorlist

and then use urlparse to parse the links/urls...



More information about the Python-list mailing list