[XML-SIG] Removing insignificant whitespace
Bob Kline
bkline at rksystems.com
Wed Sep 1 18:47:01 CEST 2004
On Wed, 1 Sep 2004, Brian Quinlan wrote:
> It's not my XML and I don't have a DTD for it.
If you don't have a DTD (or the functional equivalent), then you're out
of luck, because in that case the machine doesn't having any way of
knowing what you mean by "insignificant whitespace." You don't want the
software to assume that every text node which contains only whitespace
is insignificant, even if you have "normalized" the document to collapse
adjacent text nodes into one. Consider:
<Diagnosis>... the <Glossary id='1234'>cancer</Glossary> <Glossary
id='2345'>patient</Glossary> requires ...</Diagnosis>
Would you *really* want the presentation of this text to omit the space
between 'cancer' and 'patient'?
If you *know* that the documents will never contain such inline markup
(because, for example, you've had a peek at the elusive DTD, and have
been assured that it won't change), then you can write software to take
advantage of this special knowledge. Probably the most straightforward
approach would be an XSLT script with a template that strips whitespace
text nodes and another template which passes everything else through
unscathed.
--
Bob Kline
mailto:bkline at rksystems.com
http://www.rksystems.com
More information about the XML-SIG
mailing list