HTMLParser.HTMLParseError: EOF in middle of construct

John Nagle nagle at animats.com
Wed Jun 20 00:19:09 EDT 2007


none wrote:
> Gabriel Genellina wrote:
> 
>> En Mon, 18 Jun 2007 16:38:18 -0300, Sergio Monteiro Basto 
>> <sergio at sergiomb.no-ip.org> escribió:
>>
>>> Can someone explain me, what is wrong with this site ?
>>>
>>> python linkExtractor3.py http://www.noticiasdeaveiro.pt > test

> ok but my problem is not understand what is the specific problem at line 
> 1173
> 
>> HTMLParser expects valid HTML - try a different tool, like 
>> BeautifulSoup, which is specially designed to handle malformed pages.
>>
>> --Gabriel Genellina

    Yes, you almost have to use BeautfulSoup on real-world web pages.
Even that may not be enough; I have my own even more robust version of
BeautifulSoup.  (I've sent the fixes, which are small, to the author.)

    The usual BeautifulSoup killer is improperly terminated HTML comments. The
default action is to suck up the rest of the entire document into
the comment, which is usually not what you want.  I have a fix for that
at

http://mail.python.org/pipermail/python-list/2007-May/440370.html

				John Nagle



More information about the Python-list mailing list