Problem round-tripping with xml.dom.minidom pretty-printer

Ben Butler-Cole ben.butlercole at gmail.com
Fri Feb 29 12:21:05 EST 2008


> The last line of p() calls itself: it is an unconditional recursive call
> so, no matter what it does, it will never stop. And since p() also
> prints something, calling it will print endlessly.

Sorry, I wasn't clear. I realize that this recurses endlessly. The
problem is that it also adds blank lines endlessly.

> By removing this line, you get something like:
>
> <?xml version="1.0" ?>
> <a>
>         <b>
>                 <c/>
>         </b>
> </a>
>
> That seems sensible, imo. Was that what you wanted?

Sure. That's fine unless you then re-parse this out put and print it
again in which case you get the behaviour you describe:

> An additional thing to keep in mind is that toprettyxml does not print
> an XML identical to the original DOM tree: it adds newlines and tabs.
> When parsed again these blank characters are inserted in the DOM tree as
> character nodes. If you toprettyxml an XML document twice in a row, then
> the second one will also add newlines and tabs around the newlines and
> tabs added by the first. Since you call toprettyxml an infinite number
> of times, it is expected that lots of blank characters appear.

Right. That's the behaviour I'm asking about, which I consider to be
problematic. I would expect a module providing a parser and pretty-
printer (not just for XML parsers) to be able to conservatively round-
trip.

As far as I can see (and your comments back this up) minidom doesn't
have this property. Unless anyone knows how to get it to behave that
way...

Ben



More information about the Python-list mailing list