Looking for a decent HTML parser for Python...

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Wed Dec 6 00:02:03 EST 2006


"Just Another Victim of the Ambient Morality" <ihatespam at hotmail.com> wrote 
in message news:Gordh.303466$tl2.18227 at fe10.news.easynews.com...
>
>    Okay, I think I found what I'm looking for in HTMLParser in the 
> HTMLParser module.

    Except it appears to be buggy or, at least, not very robust.  There are 
websites for which it falsely terminates early in the parsing.  I have a 
sneaking feeling the sgml parser will be more robust, if only it had that 
one feature I am looking for.
    Can someone help me out here?
    Thank you...






More information about the Python-list mailing list