toprettyxml messes up with whitespaces

"Martin v. Löwis" martin at v.loewis.de
Wed Oct 3 12:41:28 EDT 2007


> <quote>
> Such white space is typically not intended for inclusion in the
> delivered version of the document. On the other hand, "significant"
> white space that should be preserved in the delivered version is
> common, for example in poetry and source code.
> </quote>
> 
> I interpret "significant" whitespaces as the ones between the words,

This interpretation is incorrect. It's not really possible to tell what
whitespace is significant from looking just at the document; the
classification into "significant" and "insignificant" is up to the
application, not the XML processor.

There is also the concept of "ignorable" white space in SAX (and other
APIs); by this, white space in element content is meant. This is
supported by the XML recommendation with the sentence
"A  validating XML processor  MUST also inform the application which of
these characters constitute white space appearing in element content."
(you can only know if it's in element content if you validate)

> We can assume that the spaces in front of it are not significant
> whitespaces.

No, we cannot. Maybe your application can assume that; the XML
processor cannot. In fact, the XML recommend FORBIDS the XML processor
from stripping white space.

> (etc) .. so when reading, modifying, writing XML files, the empty
> blank lines will grow exponentially.

Not sure why you keep saying that growth is exponentially; I believe
it's linear (with the number of read-write-cycles), not exponential.

> I would think (simplistic I'm sure) that if spaces are that important,
> you can always use a CDATA tag which should treat the text inside as
> raw data without any formatting and whitespace changes.

That is definitely simplistic. CDATA has no significance on formatting.

> Should I file this as a bug to be solved? I have my workaround now,
> but I read online that more people seem to have ran into this.

Feel free to come up with a patch. It is questionable whether a bug
report will help; there is a good chance that it stays open for several
years.

Regards,
Martin



More information about the Python-list mailing list