[XML-SIG] losing entities when parsing then texting
Greg Wilson
gvwilson at cs.utoronto.ca
Thu Jun 30 18:19:14 CEST 2005
This one must have come up several times before, but neither Google nor
the Cookbook have given me an answer. I'm doing this:
data = sys.stdin.read()
doc = xml.dom.minidom.parseString(data)
root = doc.documentElement
...add and modify some nodes...
sys.stdout.write(root.toxml('utf-8'))
A typical input looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE lec SYSTEM "swc.dtd">
<lec title="Introduction">
<topic title="Motivation" summary="motivation for course">
<slide>
<b1>blah
<b2>blah & blah</b2>
<b2>blah&emdash;blah</b2>
</b1>
</slide>
</topic>
</lec>
and my DTD, in its entirety, is:
<!ENTITY emdash "舒"> <!-- em dash -->
<!ENTITY lceil "⌈"> <!-- left ceiling -->
<!ENTITY ldots "…"> <!-- horizontal ellipsis -->
<!ENTITY lfloor "⌊"> <!-- left floor -->
<!ENTITY lquot "“"> <!-- left double quotes -->
<!ENTITY plusmn "ŷ"> <!-- plus or minus -->
<!ENTITY nbsp " "> <!-- non-breaking space -->
<!ENTITY rceil "⌉"> <!-- right ceiling -->
<!ENTITY rfloor "⌋"> <!-- right floor -->
<!ENTITY rquot "”"> <!-- right double quotes -->
<!ENTITY space " "> <!-- normal space -->
<!ENTITY squot """> <!-- straight double quotes -->
<!ENTITY times "×"> <!-- multiplication sign -->
<!ENTITY vdots "⋮"> <!-- vertical ellipsis -->
Problem is, all of the character entities are missing from my output:
& and &emdash; disappear. Hunting around the web, it appears that
I'm supposed to mess with ExternalEntityRefHandler, but I can't find any
examples of how the pieces fit together. If anyone has one, I'd be
grateful for a pointer...
Thanks,
Greg (gvwilson _a_t_ cs _dot_ utoronto _dot_ ca)
More information about the XML-SIG
mailing list