[Tutor] Problem parsing an HTML-Website: SGML Parser error

Charlie Clark charlie@begeistert.org
Fri, 02 Nov 2001 14:53:31 +0100


Dear all,

I've currently got a bunch of scripts all running and collecting data from various websites. The inevitable has happened and I've now got a website which cannot be 
parsed at all. The pages seem to written in Microsoft word by hand.

The error generated is 
File "C:\Python21\lib\sgmllib.py", line 91, in feed
    self.goahead(0)
  File "C:\Python21\lib\sgmllib.py", line 158, in goahead
    k = self.parse_declaration(i)
  File "C:\Python21\lib\sgmllib.py", line 238, in parse_declaration
    raise SGMLParseError(
SGMLParseError: unexpected char in declaration: '<

An example can be found at

http://www.royal-muenchen.de/archiv/f_012836.htm

What I need to know is what is the best way of finding out what is wrong in the source so that I can try and remove it before sending it to the parser.

A response to me directly would be appreciated. Many thanx for your help.

Charlie

Charlie Clark
Helmholtzstr. 20
Düsseldorf
40215
Tel: +49-178-782-6226