[Tutor] Problem parsing an HTML-Website: SGML Parser error
Charlie Clark
charlie@begeistert.org
Fri, 02 Nov 2001 14:53:31 +0100
Dear all,
I've currently got a bunch of scripts all running and collecting data from various websites. The inevitable has happened and I've now got a website which cannot be
parsed at all. The pages seem to written in Microsoft word by hand.
The error generated is
File "C:\Python21\lib\sgmllib.py", line 91, in feed
self.goahead(0)
File "C:\Python21\lib\sgmllib.py", line 158, in goahead
k = self.parse_declaration(i)
File "C:\Python21\lib\sgmllib.py", line 238, in parse_declaration
raise SGMLParseError(
SGMLParseError: unexpected char in declaration: '<
An example can be found at
http://www.royal-muenchen.de/archiv/f_012836.htm
What I need to know is what is the best way of finding out what is wrong in the source so that I can try and remove it before sending it to the parser.
A response to me directly would be appreciated. Many thanx for your help.
Charlie
Charlie Clark
Helmholtzstr. 20
Düsseldorf
40215
Tel: +49-178-782-6226