How to parse XHTML with xml.parsers.xmlproc?

Paavo Hartikainen pahartik at sci.fi
Mon Sep 17 13:22:47 EDT 2001


[[ this message contains excessively long lines, sorry about them ]]

Martin von Loewis writes:

> <head><meta http-equiv="something"></head>

That was the case.  I was well aware of this requirement but the
problem was hidden in my documents.  It only occurred while document
was in RAM, so I did not see it in my files before or after
processing.

> As for finding DTDs: If your document contains PUBLIC identifier,
> xmlproc will attempt to search catalogs. If there is only a SYSTEM
> identifier, it will interpret this as an URL. If it looks like a
> relative path name, it will look for the DTD relative to the current
> directory.

Now I seem to still have a problem with validating to DTD, while
parsing alone without validating works already.

This is what fails in my test case:

---
class Parser2(xmlval.XMLValidator):
	def __init__(self):
		xmlval.XMLValidator.__init__(self)
		self.errorhandler = ParserError(self)
		self.set_error_handler(self.errorhandler)
		self.set_application(ParserApplication(self))
		cat = catalog.xmlproc_catalog("DTD/catalog", catalog.CatParserFactory())
		self.set_pubid_resolver(cat)
		self.parse_resource("test.html")
---

Is there something clearly wrong with this?

Complete, stand-alone simplified test case is available at
<URL:http://www.sci.fi/~pahartik/files/xmltest.tar.gz> for now,
including Python code, XHTML file, DTD catalog and related DTD files.

To make things clear, I run it like this:
---
pahartik at zazu:~/coding/python/xmltest$ ./xmltest.py
---

This is the way it breaks:

---
Traceback (innermost last):
  File "./xmltest.py", line 93, in ?
    myxmlparser = Parser2()
  File "./xmltest.py", line 48, in __init__
    self.parse_resource("test.html")
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 32, in parse_resource
    self.parser.parse_resource(sysid)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 76, in parse_resource
    self.read_from(infile,bufsize)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 136, in read_from
    self.feed(buf)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 181, in feed
    self.do_parse()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 89, in do_parse
    self.parse_doctype()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 477, in parse_doctype
    self.app.handle_doctype(rootname,pub_id,sys_id)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 267, in handle_doctype
    p.parse_resource(join_sysids(self.parser.get_current_sysid(),sys_id))
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 76, in parse_resource
    self.read_from(infile,bufsize)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 136, in read_from
    self.feed(buf)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 181, in feed
    self.do_parse()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/dtdparser.py", line 226, in do_parse
    self.parse_attlist()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/dtdparser.py", line 466, in parse_attlist
    self.dtd_consumer.new_attribute(elem,attr,a_type,a_decl,a_def)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 220, in new_attribute
    self.elems[elem].add_attr(attr,a_type,a_decl,a_def,self.parser)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 269, in add_attr
    self.attrhash[attr]=Attribute(attr,a_type,a_decl,a_def)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 367, in __init__
    if error: parser.report_error(2016)                            
NameError: parser
---

I would like to know what was missing this time.

-- 
 "pienena   /  Paavo "Rainbow Rat" Hartikainen
  minusta  /  E-mail: pahartik at sci.fi
  tulee   /  URL: http://www.sci.fi/~pahartik/
  rotta" /  EFnet: pahartik at #Atari and #LionKing



More information about the Python-list mailing list