[XML-SIG] CDATA sections still not handled

Martin v. Loewis martin@mira.cs.tu-berlin.de
Wed, 17 Jan 2001 00:54:14 +0100


> I was following the logic that ext.PrettyPrint can write to a stream

That assumption is good, it indeed does.

> and that it is useful to pick up a document that has escaped
> data(which may be xml itself), add some nodes to it, and save it
> back to the stream expecting the escaped sections to be still
> present as escaped sections.

That logic is flawed (or, there is no logic in it - that's just an
assertion). Why is that useful? I.e. why would anybody who'll read the
resulting document need to know where exactly the CDATA sections where
located in the original document?

> So what I understand now is that I should either use a serializer
> that keeps these, or write a DTD and use that to write my xml back
> out to file in a more proper way.

I think your understanding is incorrect. It is not possible to write a
serializer that produces the original input by just looking at the DOM
tree, and having a DTD does not help at all, either.

> Which I guess is my next question, what is the cleanest method in
> PyXML for reading in such a file with CDATA sections, and getting
> them back out when rewriting?

The cleanest way is to accept that it is not possible to write the
document back so that it equals the original document on a
byte-by-byte basis.

It is possible to write the document back so that the content is the
same as in the original document; the cleanest way for that is to use
ext.PrettyPrint.

Regards,
Martin

P.S. What you *can* get back is CDATA sections for every text element,
by properly inheriting from the PrettyPrinter. However, this will give
you CDATA sections in places where the original document had none.