[XML-SIG] pyexpat: Comments before DOCTYPE

Ingo van Lil inguin at gmx.de
Mon Feb 13 15:30:42 CET 2006


Hello there,

I ran into a minor problem using the xml.dom.minidom XML parser: An XML
document having a comment before a DOCTYPE node seems to leave the DOM
data structures in an inconsistent state.

Let's say I have a little test.xml file:

    <?xml version="1.0"?>
    <!-- comment -->
    <!DOCTYPE test SYSTEM "test.dtd">
    <test> <tag2> Hello world </tag2> </test>

and a little Python program to parse it:

    from xml.dom.minidom import parse
    dom = parse("test.xml")
    print "document node:", dom
    print len(dom.childNodes), "children"
    print "first child:", dom.firstChild
    print "next sibling:", dom.firstChild.nextSibling

The output of that program is:

    document node: <xml.dom.minidom.Document instance at 0xb7b82b6c>
    3 children
    first child: <DOM Comment node " comment ">
    next sibling: None

I.e. the document node does have three children (a comment node, a
DocumentType instance and an element), but the first child's nextSibling
pointer isn't set correctly. This breaks my algorithm, which is supposed
to recursively walk the entire DOM tree, but stops after the first node
instead.

I'm not entirely sure whether this really is a bug in pyexpat or an
error in my XML file. I haven't found any hints whether an XML document
is allowed to have comment before the DOCTYPE declaration. xmllint
doesn't seem to complain about it, though.

        Cheers,
            Ingo



More information about the XML-SIG mailing list