[Expat-discuss] Ignoring whitespaces while parsing XML with expat

Nick MacDonald nickmacd at gmail.com
Wed Aug 11 04:51:46 CEST 2010


Tarun:
> Is there a way to ignore unneeded whitespaces (like those that get introduced while pretty-printing XML), while parsing the XML using expat parser ?

http://www.w3.org/TR/REC-xml/#sec-white-space

[quote]
In editing XML documents, it is often convenient to use "white space"
(spaces, tabs, and blank lines) to set apart the markup for greater
readability. Such white space is typically not intended for inclusion
in the delivered version of the document. On the other hand,
"significant" white space that should be preserved in the delivered
version is common, for example in poetry and source code.

An XML processor MUST always pass all characters in a document that
are not markup through to the application. A validating XML processor
MUST also inform the application which of these characters constitute
white space appearing in element content.
[end quote]

While it might be handy to have eXpat have a flag/mode that could
remove a lot of the white space that might appear optional to you,
this would be counter to the spec (as written above.)  So, you, being
the author of the "application" that the document mentions, must deal
with the white space on your own.  This shouldn't actually be too
hard...  but there are probably a good set of test cases you'd need to
run to make sure the results you get are what you really want.

Good luck,
  Nick


More information about the Expat-discuss mailing list