Looking for a decent HTML parser for Python...
Just Another Victim of the Ambient Morality
ihatespam at hotmail.com
Tue Dec 5 22:25:10 EST 2006
I'm trying to parse HTML in a very generic way.
So far, I'm using SGMLParser in the sgmllib module. The problem is that
it forces you to parse very specific tags through object methods like
start_a(), start_p() and the like, forcing you to know exactly which tags
you want to handle. I want to be able to handle the start tags of any and
all tags, like how one would do in the Xerces C++ XML parser. In other
words, I would like a simple start() method that is called whenever any tag
is encountered. How may I do this?
Thank you...
More information about the Python-list
mailing list