Python3 html.parser
Peter Otten
__peter__ at web.de
Tue Mar 18 07:44:24 EDT 2014
balaji marisetti wrote:
> Hi,
>
> I'm trying to parse a pice of HTML code using `html.parser` in Python3.
> I want to find out the offset of a particular end tag (let's say </p>) and
> then stop processing
> the remaining HTML code immediately. So I wrote something like this.
>
> [code]
> def handle_endtag(self, tag):
> if tag == mytag:
> #do something
> self.reset()
> [code]
>
> I called `reset()` method at the end of `handle_endtag()` method. Now the
> problem is: when I call parser.feed("some html"), it's giving an
> "AssertionError" exception. Isn't the `reset()` method
> supposed to be called inside "handler" methods?
Obviously not ;) After looking into the code I think there is no controlled
way to stop parsing. I suggest that you raise a custom exception instead:
import html.parser
class TagFound(Exception):
pass
class MyParser(html.parser.HTMLParser):
def handle_endtag(self, tag):
if tag == wanted_tag:
raise TagFound
wanted_tag = "a"
parser = MyParser()
for data in ["<html><body><a></a></body></html>",
"<html><body><b></b></body></html>"]:
try:
parser.feed(data)
except TagFound:
print("tag {!r} found".format(wanted_tag))
else:
print("tag {!r} not found".format(wanted_tag))
parser.reset()
More information about the Python-list
mailing list