xml.dom.minidom question

Sun Nov 20 02:22:02 EST 2011

On Sat, 19 Nov 2011 15:32:18 -0600, nivashno wrote:

> I always thought that xml was very precisely split up into nodes, 
> childnodes, etc, no matter what the whitespace between them was. But 
> apparently not, or am I missing something?

XML allows mixed content (an element's children can be a mixture of text
and elements). Formats such as XHTML wouldn't be possible otherwise.

A validating parser will know from the schema whether an element can
contain mixed content, and can use this knowledge to elide whitespace-only
text nodes within elements which don't have mixed content (however, that
doesn't meant that it will, or even that it should; some applications may
prefer to retain the whitespace in order to preserve formatting).

A non-validating parser (which doesn't use a schema) doesn't know whether
an element contains mixed content, so it has to retain all text nodes in
case they're significant.

The Python standard library doesn't include a validating XML parser.
xmlproc seems to be the preferred validating parser. That has a separate
handle_ignorable_data() method for reporting whitespace-only text nodes
within non-mixed-content elements; the handle_data() method is only called
for "significant" text.