parse HTML by class rather than tag

gatti at dsdata.it gatti at dsdata.it
Fri Feb 23 03:34:07 EST 2007


On Feb 23, 8:54 am, lorean2... at yahoo.fr wrote:
> Hello,
>
> i'm would be interested in parsing a HTML files by its corresponding
> opening and closing tags but by taking into account the class
> attributes and its values,
[...]
> so i wondering if i should go with regular expression, but i do not
> think so as i must jumpt after inner closing div, or with a simple
> parser, i've searched and foundhttp://www.diveintopython.org/html_processing/basehtmlprocessor.html
> but i would like the parser not to change anything at all (no
> lowercase).

Horribly brittle idea. Use a robust HTML parser (e.g.
http://www.crummy.com/software/BeautifulSoup/) to build a document
tree, then visit it top down and look at the value of the 'class'
attributes.

Regards,
Lorenzo Gatti




More information about the Python-list mailing list