[XML-SIG] DOM normalize() broken? entity refs lost?
Jeff.Johnson@icn.siemens.com
Jeff.Johnson@icn.siemens.com
Wed, 28 Apr 1999 13:21:04 -0400
Thanks for the entity reference fix Andrew. It now saves "®" but it still
loses things like "’". I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "’", I'm
curious where the entity ref is going. I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called. I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.
A new script is included:
import sys, os
from StringIO import StringIO
from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter
html = """
<P>Don’t</P>
"""
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered ®</P>
fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()
w.write(dom)