[XML-SIG] losing cdata tag

Paul Tremblay phthenry@earthlink.net
Fri, 31 May 2002 15:20:47 -0400


> 
> Now, this is *really* surprising, since, in
> 
> http://mail.python.org/pipermail/xml-sig/2002-May/007750.html
> 
> you reported that this very code executes sucessfully (even though not
> with the desired result).
> 
> Can you please double-check that you are executing the code that you
> said you are executing?

Sorry if I didn't make this clear from the start. There were two
sets of code. The code you mentioned above I got off python
cookbook. I have just tried this code, and hot this result:

<?xml version="1.0" encoding="iso-8859-1"?>
<foo>
&lt;some text&gt;
</foo>

So the CDATA is escaped! (Though it would be nice to have the
characters unescaped but enclosed in CDATA tags; that way it is
easier to go back and change things on your xml doc before final
output.)

The other set of code apparently uses a different library. I had
been using this code for my parsing, and saw that it wasn't
escaping the CDATA. This code must somehow be broken.

I am a big unfamiliar with the "good" code. I suppose it follows
all of the SAX protocols. However, I notice that it is very, very
slow. It took around 15 seconds just to parse the above data!

Any thoughts as to why the "bad" code is bad would help me. 

Thanks

Paul

##############################

# This code does not escape the CDATA.
# It is the code I have been working with 
# all along

#!/usr/bin/python

from xml.sax import saxutils


from xml.sax import make_parser
from xml.sax.handler import feature_namespaces


class CopyTree(saxutils.DefaultHandler):
        def __init__(self):
                self.character = ""
 

        def startElement(self, name, attrs):
                
                print "<" + name,
                for theKey in attrs.keys():
                                print " " + theKey + "=\"" + attrs[theKey] + "\"",
                print  ">",

        def characters(self, ch):
                self.character = self.character + ch
                

        def endElement(self, name):
                print self.character + "</" + name + ">",
                self.character=""

parser = make_parser()

#Tell the parser we are not interested in XML namespaces
parser.setFeature(feature_namespaces, 0)

# Create the handler
dhObj = CopyTree()

# Tell the parser to use our handler
parser.setContentHandler(dhObj)

# Parse the input
file = "/home/paul/paultemp/test.xml"

parser.parse(file)







-- 

************************
*Paul Tremblay         *
*phthenry@earthlink.net*
************************