Looking for a decent HTML parser for Python...

Stephen Eilert spedrosa at gmail.com
Wed Dec 6 11:41:40 EST 2006


Fredrik Lundh escreveu:

> >     Except it appears to be buggy or, at least, not very robust.  There are
> > websites for which it falsely terminates early in the parsing.
>
> which probably means that the sites are broken.  the amount of broken
> HTML on the net is staggering, as is the amount of code in a typical web
> browser for dealing with all that crap.  for a more tolerant parser, see:
>
>      http://www.crummy.com/software/BeautifulSoup/
>
> </F>

+1 for BeautifulSoup.

The documentation is quite brief and sometimes confusing, but I've
found it the easiest parser I've ever worked with.


Stephen




More information about the Python-list mailing list