[XML-SIG] checking a string for well-formedness
Paul Tremblay
phthenry@earthlink.net
Wed, 7 May 2003 12:04:18 -0400
I need to check a string for well-formedness. I stumbed across the
fact that you can use expat directly, so I devised this code, which
works, so long as unicode and entities aren't used:
import xml.parsers.expat
parser = xml.parsers.expat.ParserCreate()
import sys
def validate(data):
parser.Parse(data)
try:
parser.Parse(data)
return 0
except xml.parsers.expat.ExpatError:
sys.stderr.write('tagging text will result in invalid XML\n')
return 1
data = '<doc><tag>text</tag><tag>text,</tag></doc>'
validate(data)
The function validate returns 0 in this case. However, if I try this:
data = u'<doc><tag>text</tag><tag>text\u201c</tag></doc>'
I get the following error:
Traceback (most recent call last):
File "/home/paul/lib/python/paul/xml/expat.py", line 50, in ?
parser.Parse(data)
UnicodeError: ASCII encoding error: ordinal not in range(128)
Any idea what is going on here?
I have re-written the function so that it it writes the string to a
file, and then I use SAX to parse the file. If SAX fails, I know I
have ill-formed XML. However, this second solution is a kludge. I
would like to be able to test the string directly.
Thanks
Paul
--
************************
*Paul Tremblay *
*phthenry@earthlink.net*
************************