parse HTML by class rather than tag

lorean2007 at yahoo.fr lorean2007 at yahoo.fr
Fri Feb 23 02:54:20 EST 2007


Hello,

i'm would be interested in parsing a HTML files by its corresponding
opening and closing tags but by taking into account the class
attributes and its values,

<html>
<body>
...
<div class="one">
...
<div class="two">
</div>
...
</div>
...
<div class="one">...</div>
<a href="..." class="three">
</body>
</html>

in this example, i will need all content inside div with class="two",
or only class="one",

so i wondering if i should go with regular expression, but i do not
think so as i must jumpt after inner closing div, or with a simple
parser, i've searched and found
http://www.diveintopython.org/html_processing/basehtmlprocessor.html
but i would like the parser not to change anything at all (no
lowercase).

can you help ?

best.




More information about the Python-list mailing list