[XML-SIG] Parser not preserving DTD?

Matthew Shomphe Matthews@heyanita.com
Wed, 4 Sep 2002 16:42:29 -0700


I've done a few tests to see where the issue in getting mangled DTDs is =
coming from.  I can't report much success beyond the following:
1. The problem is not with pyexpat or Expat.  I was able to run some =
tests and the full DTD is passed to pyexpat.

I added the following code to test_pyexpat.py:

    def StartDoctypeDeclHandler(self, *args):
        doctypeName, systemId, publicId, has_internal_subset =3D args
        print 'DTD declared:', args
=20
The full DTD was printed to stdout

2. The SAX implementation does not natively support <!DOCTYPE> =
declarations.  From their website =
(http://www.saxproject.org/?selected=3Dfaq):
----
Does SAX support comments/CDATA sections/DOCTYPE declarations, etc.?=20
	Not in the core API. These kinds of things are pure lexical details, =
and are not relevant to most kinds of XML processing, so it doesn't make =
sense to put them in the core and force all implementors to support =
them.
	However, SAX2 is designed to be extensible, and the LexicalHandler  =
interface is supported by most SAX parsers. SAX2 parsers are not =
required to support this handler, but they are required to report an =
error if you try to use handlers they don't support.=20
----
<!NOTATION> & unparsed entites are supported.
3. The above-mentioned LexicalHandler does seem to support DTDs, but I =
have no idea how to implement this.

In short, there is some place along the processing route where data are =
being lost.  I'm not well-versed in the APIs for this set of =
applications, so I'm a bit dazed trying to track down the methods and =
attributes needed to get the DTD passed all the way through.  It seems =
to be an issue with SAX2, which has an extension, but it's just not been =
implemented yet.=20
Is there any other type of reader out there that will not truncate DTDs =
& returns a full DOM?

Thanks,
Matt