Looking for a decent HTML parser for Python...

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Tue Dec 5 23:10:14 EST 2006


"Just Another Victim of the Ambient Morality" <ihatespam at hotmail.com> wrote 
in message news:qKqdh.303031$tl2.45967 at fe10.news.easynews.com...
>    I'm trying to parse HTML in a very generic way.
>    So far, I'm using SGMLParser in the sgmllib module.  The problem is 
> that it forces you to parse very specific tags through object methods like 
> start_a(), start_p() and the like, forcing you to know exactly which tags 
> you want to handle.  I want to be able to handle the start tags of any and 
> all tags, like how one would do in the Xerces C++ XML parser.  In other 
> words, I would like a simple start() method that is called whenever any 
> tag is encountered.  How may I do this?
>    Thank you...

    Okay, I think I found what I'm looking for in HTMLParser in the 
HTMLParser module.
    Thanks...






More information about the Python-list mailing list