[XML-SIG] losing cdata tag

Paul Tremblay phthenry@earthlink.net
Mon, 27 May 2002 00:14:57 -0400


On Sun, May 26, 2002 at 09:35:48PM +0200, Juergen Hermann wrote:
>
> 
> >I have copied the code below from python cookbook, but it is not
> >working. Specifically, the "def startCDATA(self)" method does not
> >get invoked.
> 
> That depends on the parser you use.

Yes, I was afraid this was the case. In fact, I don't know what
parser I am using! Pardon me for my ignorance. I know that expat
in on my system. I also know that just about 3 weeks ago I
downloaded the library from python--not the newest one, but the
one that has been a standard for a long time. (Does the snippet
of code tell you what parsere I am using?)

> 
> >Since this method does not get invoked, my cdata is not getting
> >put between the right tags. That means my output file is no
> >longer valid xml, and I cannot parse it again.
> 
> Are you sure on that, or did you just guess that? When in_cdata is
> never set, ALL text becomes escaped, i.e. the output is still valid
> XML, even if without CDATA sections.

Yes, I did try to parse the data again with no luck. When I use
an xslt style sheet with xsltproc, the CDATA in fact becomes
escaped. But when I use these libraries:


from xml.sax import saxutils
from xml.sax import make_parser
from xml.sax.handler import feature_namespaces

the data is not escaped. Specifially, I had this between my
CDATA:

<some text>

This was passed on as "<some text>" and not &lt;some text&gt;.

I have come up with a temporary solution. Before I parse my file,
I change the following lines:

re.sub(r"<!\[CDATA\[", "<[CDATA[\n##BEG OF CDATA##")
re.sub(r"\]\]\>", "##END OF CDATA##\n"]]>)

After I parse the file, I know exactly where my CDATA flags
began and end, and I replace them with the tags again
("<[CDATA[" and "]]>"). I am then ready to parse the data again.

Although my method works, it seems like a pretty bad hack. 

Thanks

Paul


-- 

************************
*Paul Tremblay         *
*phthenry@earthlink.net*
************************