[XML-SIG] DOM normalize() broken? entity refs lost?

Jeff.Johnson@icn.siemens.com Jeff.Johnson@icn.siemens.com
Wed, 28 Apr 1999 13:21:04 -0400


Thanks for the entity reference fix Andrew.  It now saves "®" but it still
loses things like "’".  I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "’", I'm
curious where the entity ref is going.  I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called.  I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.

A new script is included:

import sys, os
from StringIO import StringIO

from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter

html = """
<P>Don&#8217;t</P>
"""
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered &reg;</P>

fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()
w.write(dom)