[martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Mon, 2 Oct 2000 01:33:12 +0200


> On further reflection, I can see that my previous concern about two
> original TEXT children of <username> was nonsensical (if they were
> really distinct, they should be elements), but nonetheless, the
> lesson about having to concatenate all TEXT children to get the
> original text value seems to be true.

I think you have a point on splitting a text fragment into multiple
Text nodes; the DOM spec says about the interface Text:

# If there is no markup inside an element's content, the text is
# contained in a single object implementing the Text interface that is
# the only child of the element. If there is markup, it is parsed into
# a list of elements and Text nodes that form the list of children of
# the element.

# When a document is first made available via the DOM, there is only
# one Text node for each block of text. Users may create adjacent Text
# nodes that represent the contents of a given element without any
# intervening markup, but should be aware that there is no way to
# represent the separations between these nodes in XML or HTML, so
# they will not (in general) persist between DOM editing sessions. The
# normalize() method on Element [p.38] merges any such adjacent Text
# objects into a single node for each block of text; this is
# recommended before employing operations that depend on a particular
# document structure, such as navigation with XPointers.

[from REC-DOM-Level-1-19981001]

I'm not sure what that means for parsing &lt;hallo&gt; - is it
permitted that these are split into three Text nodes, is it required
that they are split, or is it disallowed?

According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an
entity reference is markup; 4.1 says that &gt; is an entity reference
(*not* a character reference) - so it appears permitted that multiple
Text nodes are created.

You *should* be able to merge them by calling normalize() on the tree;
I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in
PyXML 0.6. Please note that normalize won't merge CDATA sections.

Regards,
Martin