Parsing HTML document, how?

George K phark52 at yahoo.com
Thu Sep 23 21:53:51 EDT 2004


This what my program should do, you give it the URL to a page and a
template file, it downloads that page and then using the template file it
returns some information. 

The way I thought of doing it was that the template file uses regex and
then in my program I just do re.search(template, htmlpage) and this would
work but the HTML document has characters like ? and * that I need to
escape in the template, so this solution doesn't work. What is a better
way to accomplish what I want? does Python have any standard library for
this?

The parsing has to be dynamic, from the template file, the URLs are not
fixed.






More information about the Python-list mailing list