[XML-SIG] Removing insignificant whitespace

Bob Kline bkline at rksystems.com
Wed Sep 1 18:47:01 CEST 2004


On Wed, 1 Sep 2004, Brian Quinlan wrote:

> It's not my XML and I don't have a DTD for it.

If you don't have a DTD (or the functional equivalent), then you're out 
of luck, because in that case the machine doesn't having any way of 
knowing what you mean by "insignificant whitespace."  You don't want the 
software to assume that every text node which contains only whitespace 
is insignificant, even if you have "normalized" the document to collapse 
adjacent text nodes into one.  Consider:

<Diagnosis>... the <Glossary id='1234'>cancer</Glossary> <Glossary
id='2345'>patient</Glossary> requires ...</Diagnosis>

Would you *really* want the presentation of this text to omit the space 
between 'cancer' and 'patient'?

If you *know* that the documents will never contain such inline markup
(because, for example, you've had a peek at the elusive DTD, and have
been assured that it won't change), then you can write software to take
advantage of this special knowledge.  Probably the most straightforward
approach would be an XSLT script with a template that strips whitespace
text nodes and another template which passes everything else through
unscathed.

-- 
Bob Kline
mailto:bkline at rksystems.com
http://www.rksystems.com



More information about the XML-SIG mailing list