toprettyxml messes up with whitespaces

Paul Boddie paul at boddie.org.uk
Wed Oct 3 05:57:44 EDT 2007


On 3 Okt, 11:30, "Jorgen Bodde" <jorgen.maill... at gmail.com> wrote:
>
> Thank you for confirming this, I did manage a work around. When
> reading back the XML file, I strip it off it's whitespaces before I
> parse it. Then when writing it back no excessive whitespaces are
> appended. My best guess is that toprettyxml is not intelligently
> handling whitespaces that are already there, and bluntly appends more
> whitespaces to it, making it grow exponentially.

This seems like a reasonable explanation without having looked at the
source code myself.

[...]

> And then I simply use parseString instead of parse. But honestly, I
> think it is a bug, because the XML standard also says that whitespaces
> before normal text should be ignored, and I do not see it back as text
> when I read the node, so why preserve it and mess up the formatting in
> the end?

Which part of the standard is this? Here's the XML 1.0 specification's
section on whitespace:

http://www.w3.org/TR/2006/REC-xml-20060816/#sec-white-space

It seems to me that applications (and the libraries which serve them)
can choose what to do unless xml:space is set to "preserve". It does
seem odd that the toprettyxml method chooses to respect existing
whitespace whilst also disrupting it by adding more, however.

Paul




More information about the Python-list mailing list