toprettyxml messes up with whitespaces

Jorgen Bodde jorgen.maillist at gmail.com
Wed Oct 3 05:30:04 EDT 2007


Hi there,

Thank you for confirming this, I did manage a work around. When
reading back the XML file, I strip it off it's whitespaces before I
parse it. Then when writing it back no excessive whitespaces are
appended. My best guess is that toprettyxml is not intelligently
handling whitespaces that are already there, and bluntly appends more
whitespaces to it, making it grow exponentially.

This is the snippet;

            f = open(filename, "rt")
            for line in f:
                s = line.strip(' \t\n')
                if s:
                    xmlstr += s + ' ' # space needed for spanning text nodes

And then I simply use parseString instead of parse. But honestly, I
think it is a bug, because the XML standard also says that whitespaces
before normal text should be ignored, and I do not see it back as text
when I read the node, so why preserve it and mess up the formatting in
the end?

Regards,
- Jorgen




On 10/2/07, kyosohma at gmail.com <kyosohma at gmail.com> wrote:
> On Oct 2, 11:43 am, "Jorgen Bodde" <jorgen.maill... at gmail.com> wrote:
> > Hi all,
> >
> > I parse an XML file, replace a node with a new one (like updating
> > cache) and write it back. Every write, new spaces are added. For
> > example, first read - update - write cycle;
> >
> > <var name="APPNAME" status="undefined">
> >      My First App
> > </var>
> >
> > Second cycle:
> >
> > <var name="APPNAME" status="undefined">
> >                  My First App
> > </var>
> >
> > Third cycle:
> >
> > <var name="APPNAME" status="undefined">
> >                                            My First App
> > </var>
> >
> > And this goes on. The node is one that is not touched in the XML, it
> > is simply written back after reading. I have the same with void spaces
> > in between the nodes, I managed to compensate that by stripping the
> > lines.
> >
> > I would like to use toprettyxml to make it user editable and viewable.
> > But this is really weird. How can I circumvent this behaviour?
> >
> > regards,
> > - Jorgen
>
> I had similar problems and ended up switching to the lxml package to
> solve the issue. I think you can do it with ElementTree too. Maybe
> somebody with more experience with the xml / minidom modules will show
> up soon.
>
> Mike
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list