[XML-SIG] problem with elementtree 1.2.6
Chris Withers
chris at simplistix.co.uk
Tue Nov 27 11:48:44 CET 2007
Fredrik Lundh wrote:
>> Sorry if this should go to a list, I couldn't find one...
>> (please send me that way if there is one...)
>
> python-list/comp.lang.python or xml-sig are good choices.
OK, lets go with xml-sig :)
>> I've bumped into an annoying problem, which I actually think is a
>> problem with expat:
>>
>> >>> from xml.parsers import expat
>> >>> parser = expat.ParserCreate()
>> >>> def handle(data): print repr(data)
>> ...
>> >>> parser.CharacterDataHandler = handle
>> >>> parser.Parse('<xml><node/></xml>',0)
>> u'<'
>> u'node/'
>> u'>'
>> 1
>>
>> Now, why is expat unquoting those two entities?
>
> in an XML file, the characters < and & *must* be escaped (either as
> entity references or character references) when appearing in normal
> text:
Yes indeed.
> the following entities are predefined: & (&) < (<) > (>)
> " (") ' (').
Okay, so in the above, if I really mean <, the xml should be:
'<xml>&lt;/&gt;</xml>'
Seems a little clunky, but okay...
I guess this was causing me problems as I'm working on a bug in Twiddler
(http://www.simplistix.co.uk/software/python/twiddler)
where quoted html was ending up unquoted after processing:
>>> from twiddler import Twiddler
>>> t = Twiddler('<span><b></span>')
>>> t.render()
u'<span><b></span>'
Now, I see how you fixed this in ElementTree by re-escaping all the
predefined entities (out of interest, why is the funtion called
_escape_cdata rather than _escape_data?) but I can't do that because I
want uses to be able to insert chunks of html and choose whether or not
they are escaped:
>>> t = Twiddler('<span id="something"/>')
escaping:
>>> t['something'].replace('<b>')
>>> t.render()
u'<span id="something"><b></span>'
no escaping:
>>> t['something'].replace('<b>',filters=())
>>> t.render()
u'<span id="something"><b></span>'
I guess in my use of ElementTree, I need to make sure character data is
re-escaped at the tree building stage?
> other names give an error unless they've been
> explicitly defined.
So I see:
>>> from xml.parsers import expat
>>> parser = expat.ParserCreate()
>>> parser.Parse('<xml>&foo;</xml>',0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
xml.parsers.expat.ExpatError: undefined entity: line 1, column 5
But why does calling UseForeignDTD suddenly make everything ok?
>>> parser = expat.ParserCreate()
>>> parser.UseForeignDTD()
>>> parser.Parse('<xml>&foo;</xml>',0)
1
What extra hooks get called as a result of calling UseForeignDTD?
cheers,
Chris
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
More information about the XML-SIG
mailing list