[XML-SIG] unicode problems in elementtree
Bryan Lawrence
b.n.lawrence at rl.ac.uk
Fri May 26 22:22:41 CEST 2006
Hi Folks
elementtree is barfing (well to be correct, expat is barfing) with some
unicode strings I'm passing through to it ...
eg:
self = <ElementTree.XMLTreeBuilder instance>, self._parser =
<pyexpat.xmlparser object>, self._parser.Parse = <built-in method Parse of
pyexpat.xmlparser object>, data =
u'<DIF><Entry_ID>badc.nerc.ac.uk:DIF:NM_HiGEM_yaao...on_Date>2005-02-03</Last_DIF_Revision_Date></DIF>'
ExpatError: not well-formed (invalid token): line 1, column 11389
args = ('not well-formed (invalid token): line 1, column 11389',)
code = 4
lineno = 1
offset = 11389
For the record, we find [3 <= tau ]in that block ... we also have problem with
degree symbols and whatever ..
I suspect the problem is that I'm not actually passing an xml document (with a
character encoding definition) to ET ... I'm just passing some stuff which is
an xml fragment (from a web service interface to a database).
Does elementtree and/or expat need to know the encoding to get this right?
(which may be a problem coz this could be from anyone's document in any
encoding ...)
(Sorry, I'm a bit unicode illiterate, and while I appreciate it's something I
should know, there is other stuff filling my mind at the moment ...)
Bryan
More information about the XML-SIG
mailing list