Beginner: HTML Parsing

Fri May 17 05:12:20 EDT 2002

On Fri, 2002-05-17 at 02:14, Kragen Sitaker wrote:
> "J. David Lashar" <dlashar at sprynet.com> writes:
> > As a beginner, I'm working through the O'Reilly books mentioned in an
> > earlier posting, but I haven't found much guidance on parsing an HTML file
> > once I've pulled it down with httplib.  And I'm finding the Python Library
> > Reference to be a bit cryptic.   Could someone point to resources or provide
> > examples?
> 
> If possible, use Perl and HTML::Parser (or HTML::LinkExtor if that's
> what you want) instead.  Python doesn't yet have anything nearly as
> good.

Well, if you're going to talk like that, you can't just stop there.  How
is HTML::Parser better than, say, htmllib?  Or mxTidy (to translate to
XHTML) with an XML parser?

I notice some extra features in the way the argspec is defined, which
seems convenient, but not huge.  Is there something else I'm missing?

  Ian