Looking for a decent HTML parser for Python...

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Tue Dec 5 22:25:10 EST 2006


    I'm trying to parse HTML in a very generic way.
    So far, I'm using SGMLParser in the sgmllib module.  The problem is that 
it forces you to parse very specific tags through object methods like 
start_a(), start_p() and the like, forcing you to know exactly which tags 
you want to handle.  I want to be able to handle the start tags of any and 
all tags, like how one would do in the Xerces C++ XML parser.  In other 
words, I would like a simple start() method that is called whenever any tag 
is encountered.  How may I do this?
    Thank you...






More information about the Python-list mailing list