toprettyxml messes up with whitespaces

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Wed Oct 3 08:49:51 EDT 2007


On Wed, 03 Oct 2007 12:18:45 +0200, Jorgen Bodde wrote:

>> Which part of the standard is this? Here's the XML 1.0 specification's
>> section on whitespace:
>>
>> http://www.w3.org/TR/2006/REC-xml-20060816/#sec-white-space
> 
> Well 2.10 if I quote:
> 
> <quote>
> Such white space is typically not intended for inclusion in the
> delivered version of the document. On the other hand, "significant"
> white space that should be preserved in the delivered version is
> common, for example in poetry and source code.
> </quote>
> 
> I interpret "significant" whitespaces as the ones between the words,
> if whitespaces occur at the beginning of a line due to an indent like

Significant whitespace is all whitespace in nodes that may contain text. 
You need a DTD or schema to decide this, that's why all pretty printing
without a DTD or schema is broken IMHO.  Because you then simply don't
know if it is safe to strip or add whitespace.

> <value>
>      This is indented text
> </value>
> 
> We can assume that the spaces in front of it are not significant
> whitespaces.

I can't.  You are just guessing.

> Because when I read the text node in python and it is not
> included, I see no reason why it should be preserved.

But it should be included.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list