[XML-SIG] SAX, escape problem

Gregor Mosheh stigmata@blackangel.net
Tue, 2 Apr 2002 20:15:41 -0800 (PST)


It appears that something in SAX is failing to convert quote marks
and some other characters into their entities. How do I correct this?


### This test program...

import xml.sax, xml.sax.writer, xml.sax.handler
import cStringIO
import re
def encode(hash):
    buffer = cStringIO.StringIO()
    saxout = xml.sax.writer.PrettyPrinter(buffer)
    saxout.startDocument()
    saxout.startElement("objects",{})
    saxout.startElement("object",hash)
    saxout.endElement("object")
    saxout.endElement("objects")
    saxout.endDocument()
    return buffer.getvalue()
print encode( { 'keyword' : '< \"this is in quotes\" >' } )+"\n\n"

### ...generates this output, indicating that quote characters are
### not escaped, though <, >, and & characters are escaped. This
### also occurs for half-quote characters and for >128 Unicode
### characters.

<?xml version="1.0" encoding="iso-8859-1"?>
<objects><object keyword="&lt; "this is in quotes" &gt;"/></objects>



### So, I tried inserting some code to do the
### quote-to-&quot; substitutions myself.
### And this test program...

import xml.sax, xml.sax.writer, xml.sax.handler
import cStringIO
import re
def encode(hash):
    buffer = cStringIO.StringIO()
    saxout = xml.sax.writer.PrettyPrinter(buffer)
    saxout.startDocument()
    saxout.startElement("objects",{})
    for thiskey in hash.keys():
        thisval = hash[thiskey]
        thisval = re.sub('"','&quot;',thisval)
        thisval = re.sub("'",'&apos;',thisval)
        hash[thiskey] = thisval
    saxout.startElement("object",hash)
    saxout.endElement("object")
    saxout.endElement("objects")
    saxout.endDocument()
    return buffer.getvalue()
print encode( { 'keyword' : '< \"this is in quotes\" >' } )+"\n\n"

### ...generates the following output, indicating that the escaped
### quote mark entity is re-escaped by something in SAX. This same
### effect occurs, of course, if I do similar subs for >128 Unicode
### characters or for half-quote marks.

<?xml version="1.0" encoding="iso-8859-1"?>
<objects><object keyword="&lt; &amp;quot;this is in
quotes&amp;quot; &gt;"/></objects>