HTML Structure Extraction

Fredrik Lundh fredrik at pythonware.com
Wed Dec 8 14:25:47 EST 2004


<dayzman at hotmail.com> wrote:

> I'm going to write a program that extracts the structure of HTML
> documents. The structure would be in the form of a tree, separating the
> tags and grouping the start and end tags. I think I will use
> htmllib.HTMLParser, is it appropriate for my application? If so, I
> believe I will need to keep track of the depth reached.

you mean like:

    http://www.crummy.com/software/BeautifulSoup/
    http://effbot.org/zone/element-tidylib.htm
    http://utidylib.berlios.de/
    http://www.xmlsoft.org/
    http://effbot.org/zone/pythondoc-elementtree-HTMLTreeBuilder.htm

and a few dozen others?

</F> 






More information about the Python-list mailing list