[XML-SIG] Pretty-printing DOM trees

A.M. Kuchling akuchlin@cnri.reston.va.us
Wed, 20 Jan 1999 22:45:46 -0500


The format() function below pretty-prints a DOM tree.  It strips away
all the whitespace, and then inserts Text nodes containing white
space, producing output like this:

<?xml version="1.0"?>
<?IS10744:arch name="xsa"?>
<HTML>
    <HEAD>
        <TITLE>xmlproc: A Python XML parser</TITLE>
        <META xsa='last-release' VALUE='19980718'/>
    </HEAD>
    <BODY>
        <H1>
            <SPAN xsa='name'>xmlproc</SPAN>: A Python XML parser
       </H1>
   </BODY>
</HTML>

Should this be left as just a black-box function, or should it be
implemented as a subclass of the writer.XmlWriter() class?  I suppose
it depends on the envisioned application for this; if it's just to
make output a little bit more readable for debugging purposes, then
customizability isn't very important.  On the other hand, if people
will want to do careful indenting of the output, indenting some tags
and not others, then the XmlWriter solution is the way to go.
My inclination is to the former view, but then, that's also easier for 
me. :)  Thoughts?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
We have first raised a dust and then complain we cannot see.
    -- Bishop Berkeley


from xml.dom import utils, core

d = utils.FileReader()
dom = d.readFile( '/scratch/xsademo.xml' )

def format(node, indent=4):
    """Pretty-print a DOM tree"""

    utils.strip_whitespace( node )

    if node.nodeType == core.DOCUMENT_NODE:
        node = node.documentElement

    stack = [ (0,node) ]

    document = node.get_ownerDocument()

    # Add a newline before the opening and closing tags of the root element
    parent = node.get_parentNode()
    parent.insertBefore( document.createTextNode('\n'), node )
    node.appendChild( document.createTextNode('\n') )
    
    while (stack):
        # get the top node from the stack
        depth, node = stack[-1]

        # walk this node's list of children, deleting those that are
        # all whitespace and saving the rest to be pushed onto the stack
        children = []
        for child in node.childNodes[:] :
            if child.nodeType == core.ELEMENT_NODE:
                spacing = '\n' + (' '*(depth+1)*indent)

                # Add spacing before the child element; this space goes before
                # the start tag.
                text = document.createTextNode( spacing )
                node.insertBefore( text, child )

                # Check if the child element has any element children; if so,
                # we'll add whitespace before the closing tag.
                has_element_children = 0
                for n in child.get_childNodes():
                    if n.nodeType == core.ELEMENT_NODE:
                        has_element_children=1

                if has_element_children:
                    # Add spacing as the last child of the child element; this
                    # will go before the closing tag.
                    text = document.createTextNode( spacing )
                    child.appendChild( text )

            if child.hasChildNodes():
                children.append ( (depth+1,child) )
        children.reverse()
        stack[-1:] = children
        
    # end: while stack not empty

format(dom)

print dom.toxml()