Output of HTML parsing
Stefan Behnel
stefan.behnel-n05pAM at web.de
Tue Jun 19 11:27:28 EDT 2007
Jackie schrieb:
> On 6 15 , 2 01 , Stefan Behnel <stefan.behnel-n05... at web.de> wrote:
>> Jackie wrote:
>
>> import lxml.etree as et
>> url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/"
>> tree = et.parse(url)
>>
>
>> Stefan- -
>>
>> - -
>
> Thank you. But when I tried to run the above part, the following
> message showed up:
>
> Traceback (most recent call last):
> File "D:\TS\Python\workspace\eco_department\lxml_ver.py", line 3, in
> <module>
> tree = et.parse(url)
> File "etree.pyx", line 1845, in etree.parse
> File "parser.pxi", line 928, in etree._parseDocument
> File "parser.pxi", line 932, in etree._parseDocumentFromURL
> File "parser.pxi", line 849, in etree._parseDocFromFile
> File "parser.pxi", line 557, in etree._BaseParser._parseDocFromFile
> File "parser.pxi", line 631, in etree._handleParseResult
> File "parser.pxi", line 602, in etree._raiseParseError
> etree.XMLSyntaxError: line 2845: Premature end of data in tag html
> line 8
>
> Could you please tell me where went wrong?
Ah, ok, then the page is not actually XHTML, but broken HTML. Use this idiom
instead:
parser = et.HTMLParser()
tree = et.parse(url, parser)
Stefan
More information about the Python-list
mailing list