[XML-SIG] Parser not preserving DTD?

Matthew Shomphe Matthews@heyanita.com
Tue, 3 Sep 2002 09:25:00 -0700


I've noticed that two different methods of parsing an XML grammar have =
both yielded outputs with DTDs different from the input DTD.  For =
example,
given the input:

<?xml version=3D"1.0" encoding=3D"ISO8859-1"?>
<!DOCTYPE grammar SYSTEM =
"http://www.w3.org/TR/speech-grammar/grammar.dtd">
<grammar version=3D"1.0" mode=3D"voice" root=3D"ROOT" =
xmlns=3D"http://www.w3.org/2001/06/grammar">
	<rule id=3D"ROOT">
		<item>test</item>
	</rule>
</grammar>

And the following code:

#! "D:\Python22\python.exe"
import sys
from xml.dom.ext.reader import Sax2
from xml.dom.ext    import PrettyPrint, Print

if __name__ =3D=3D "__main__":
    usage =3D "Usage: " + sys.argv[0] + " <input_XML> =
[output_XML]\nDefault output to STDOUT\n"
    try:
        sInFile =3D open(sys.argv[1], "r")
    except IndexError:
        sys.stderr.write(usage)
        sys.exit()
    try:
        sOutFile =3D open(sys.argv[2], "w")
    except IndexError:
        sOutFile =3D sys.stdout
    reader =3D Sax2.Reader()
    doc =3D reader.fromStream(sInFile)
    PrettyPrint(doc, sOutFile)

The following will be output:

<?xml version=3D"1.0" encoding=3D"ISO8859-1"?>
<!DOCTYPE grammar>
<grammar version=3D"1.0" mode=3D"voice" root=3D"ROOT" =
xmlns=3D"http://www.w3.org/2001/06/grammar">
	<rule id=3D"ROOT">
		<item>test</item>
	</rule>
</grammar>

Is this a bug?  If not, how can I preserve DTDs when reading in and =
manipulating a document?

Thanks in advance,
Matt Shomphe

--------------
Matt Shomphe
MatthewS@HeyAnita.com      =20