Looking for a decent HTML parser for Python...

Fredrik Lundh fredrik at pythonware.com
Wed Dec 6 00:47:49 EST 2006


>     Except it appears to be buggy or, at least, not very robust.  There are 
> websites for which it falsely terminates early in the parsing.

which probably means that the sites are broken.  the amount of broken 
HTML on the net is staggering, as is the amount of code in a typical web 
browser for dealing with all that crap.  for a more tolerant parser, see:

     http://www.crummy.com/software/BeautifulSoup/

</F>




More information about the Python-list mailing list