SGMLParser problem

sanjay sanjay2kind at yahoo.com
Fri Nov 8 10:54:15 EST 2002


Hi,

Any one has suggestion for following problem. Some word documents
have been converted to HTML page in Ms-Word. Want to filter html tags
like..
<o:p></o:p>,
<![if !supportEmptyParas]> <![endif]>, etc. I couldn't solve
using SGMLParser. Shows error like..

raceback (most recent call last):
  File "D:\Python21\Pythonwin\pywin\framework\scriptutils.py", line
301, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Program Files\test\msword_html_parser_1.py", line 166, in ?
    s =  get_parsed_content1()
  File "C:\Program Files\test\msword_html_parser_1.py", line 71, in
get_parsed_content1
    mk.feed(obj.read())
  File "D:\Python21\lib\sgmllib.py", line 91, in feed
    self.goahead(0)
  File "D:\Python21\lib\sgmllib.py", line 158, in goahead
    k = self.parse_declaration(i)
  File "D:\Python21\lib\sgmllib.py", line 238, in parse_declaration
    raise SGMLParseError(
SGMLParseError: unexpected char in declaration: '<'


Thanks,
Sanjay



More information about the Python-list mailing list