How to parse XHTML with xml.parsers.xmlproc?
Paavo Hartikainen
pahartik at sci.fi
Mon Sep 17 13:22:47 EDT 2001
[[ this message contains excessively long lines, sorry about them ]]
Martin von Loewis writes:
> <head><meta http-equiv="something"></head>
That was the case. I was well aware of this requirement but the
problem was hidden in my documents. It only occurred while document
was in RAM, so I did not see it in my files before or after
processing.
> As for finding DTDs: If your document contains PUBLIC identifier,
> xmlproc will attempt to search catalogs. If there is only a SYSTEM
> identifier, it will interpret this as an URL. If it looks like a
> relative path name, it will look for the DTD relative to the current
> directory.
Now I seem to still have a problem with validating to DTD, while
parsing alone without validating works already.
This is what fails in my test case:
---
class Parser2(xmlval.XMLValidator):
def __init__(self):
xmlval.XMLValidator.__init__(self)
self.errorhandler = ParserError(self)
self.set_error_handler(self.errorhandler)
self.set_application(ParserApplication(self))
cat = catalog.xmlproc_catalog("DTD/catalog", catalog.CatParserFactory())
self.set_pubid_resolver(cat)
self.parse_resource("test.html")
---
Is there something clearly wrong with this?
Complete, stand-alone simplified test case is available at
<URL:http://www.sci.fi/~pahartik/files/xmltest.tar.gz> for now,
including Python code, XHTML file, DTD catalog and related DTD files.
To make things clear, I run it like this:
---
pahartik at zazu:~/coding/python/xmltest$ ./xmltest.py
---
This is the way it breaks:
---
Traceback (innermost last):
File "./xmltest.py", line 93, in ?
myxmlparser = Parser2()
File "./xmltest.py", line 48, in __init__
self.parse_resource("test.html")
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 32, in parse_resource
self.parser.parse_resource(sysid)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 76, in parse_resource
self.read_from(infile,bufsize)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 136, in read_from
self.feed(buf)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 181, in feed
self.do_parse()
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 89, in do_parse
self.parse_doctype()
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 477, in parse_doctype
self.app.handle_doctype(rootname,pub_id,sys_id)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 267, in handle_doctype
p.parse_resource(join_sysids(self.parser.get_current_sysid(),sys_id))
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 76, in parse_resource
self.read_from(infile,bufsize)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 136, in read_from
self.feed(buf)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 181, in feed
self.do_parse()
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/dtdparser.py", line 226, in do_parse
self.parse_attlist()
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/dtdparser.py", line 466, in parse_attlist
self.dtd_consumer.new_attribute(elem,attr,a_type,a_decl,a_def)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 220, in new_attribute
self.elems[elem].add_attr(attr,a_type,a_decl,a_def,self.parser)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 269, in add_attr
self.attrhash[attr]=Attribute(attr,a_type,a_decl,a_def)
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 367, in __init__
if error: parser.report_error(2016)
NameError: parser
---
I would like to know what was missing this time.
--
"pienena / Paavo "Rainbow Rat" Hartikainen
minusta / E-mail: pahartik at sci.fi
tulee / URL: http://www.sci.fi/~pahartik/
rotta" / EFnet: pahartik at #Atari and #LionKing
More information about the Python-list
mailing list