[XML-SIG] xml.dom.minidom.toprettyxml whitespace question

Malcolm Tredinnick malcolm at commsecure.com.au
Tue Jan 4 04:23:31 CET 2005


On Mon, 2005-01-03 at 13:29 -0500, Andy Meyer wrote:
> Hello all,
> 
> I have a question about the xml.dom.minidom.toprettyxml method's 
> insertion of whitespace into text elements, e.g. 
> '<foo><bar>Hello!</bar></foo>' getting transformed by toprettyxml to:
> 
> <foo>
>     <bar>
>          Hello!
>     </bar>
> </foo>
> 
> with the addition of tabs and newlines around 'Hello!', instead of:
> 
> <foo>
>     <bar>Hello!</bar>
> </foo>
> 
> Since a SAX-style parser would read the second example as identical to 
> the raw XML, to me the second way is more correct than the first, but 
> I'm new to XML and handling whitespace seems to be an unresolved issue. 
> Is this behavior by design?

Rich has already answered your question, but I thought I would just
point out that, in fact, the second example would not generally produce
the same SAX events as the raw XML. For the raw XML, you would see
(using a bad summary of SAX events):

        - start "foo" element
        - start "bar" element
        - characters ("Hello!")
        - end "bar" element
        - end "foo" element

whereas the second layout will produce

        - start "foo" element
        - characters (newline + tabs or spaces)
        - start "bar" element
        - characters ("Hello!")
        - end "bar" element
        - characters (newline)
        - end "foo" element

In other words, the SAX parser will not normally discard things your eye
will gloss over; it cannot tell that they are not significant.

Cheers,
Malcolm



More information about the XML-SIG mailing list