[XML-SIG] Newbie : Identifying characters that will choke XML parser

John Wilson tug@wilson.co.uk
Tue, 6 May 2003 21:55:40 +0100


I've done some poking around with minidom (I have never used it before). It
would appear that it does (correctly) replace & with & but it does not
like characters with values > 127 and does not replace them with numeric
character entities.

I would suggest that you try using the full DOM implementation.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk

----- Original Message ----- 
From: "Ian Sparks" <Ian.Sparks@etrials.com>
To: "James Oakley" <joakley@solutioninc.com>; <xml-sig@python.org>; "John
Wilson" <tug@wilson.co.uk>
Sent: Tuesday, May 06, 2003 5:45 PM
Subject: RE: [XML-SIG] Newbie : Identifying characters that will choke XML
parser


Thank you James & John your solutions allow me to filter out what should be
marked as "bad" characters.

However, I'm having real problems with character conversions. I'm building
an xml document using minidom and setAttributeNS()

I want to be able to do something like :

from xml.dom.minidom import parseString

doc1 = parseString('<test/>')
docNode = doc1.childNodes[0]
docNode.setAttributeNS(None,'a',chr(180))
source = doc1.toxml('iso-8859-1')

and have source contain :

<?xml version="1.0" encoding="iso-8859-1" ?>
<test a="&#180;"/>

without getting UnicodeErrors from codecs.py on toxml() and without ending
up with :

<?xml version="1.0" encoding="iso-8859-1" ?>
<test a="&amp;#180;"/>

Either this is really hard or, more likely, I'm really ignorant.