[XML-SIG] problem with elementtree 1.2.6

Fredrik Lundh fredrik at pythonware.com
Thu Nov 29 00:33:08 CET 2007


Chris Withers wrote:

>> That's how escaping works, be it in XML, encodings, compression, whatever.
> 
> Well yes and no. I'd expect escaping to work such that whatever we're 
> dealing with can be round tripped, ie: parsed, serialiazed, parsed 
> again, etc.

that's exactly how it works in ET, of course.  you put Python strings in 
the tree, the ET parsers and serializers take care of the rest.

     elem = ET.Element("tag")
     elem.text = value # ASCII or Unicode string

     ... write to disk ...
     ... read it back ...

     assert elem.text == value

>> You can read the SGML spec regarding CDATA.
> 
> Not sure what that's supposed to mean. CDATA for me means stuff inside a 
> <![CDATA[ ]]> section._escape_cdata is used for everything inside any 
> tag that isn't another tag.

cdata is character data; see

     http://www.w3.org/TR/html401/types.html#h-6.2

that's not the same thing as a "CDATA section" (which is just one of 
several ways to store character data in an XML file).  how things are 
stored doesn't matter; that's just a serialization detail:

     http://www.w3.org/TR/xml-infoset/#omitted

     What is not in the Information Set

     6. Whether characters are represented by character references.
     19. The boundaries of CDATA marked sections.
     ...

> I and many others do not ;-) When writing content into an html template, 
> that content often comes from other sources that spit out lumps of html. 
> Being able to insert them without escaping is a common use case.

HTML might be similar to XML, but an XML parser cannot parse HTML, so 
you cannot insert HTML fragments into an XML document without either
escaping it, or pre-processing it to make sure it's well-formed.

if you want to insert literal XML fragments in an ET tree, use the XML 
factory function:

     fragment = "<tag>...</tag>"
     elem.append(ET.XML(fragment))

if you want to embed HTML fragments in an ET tree, use ElementTidy or 
ElementSoup (or equivalent) to turn the fragment into properly nested 
and properly namespaced XHTML.

if you want to do unstructured string handling, use a template library 
or Python strings.  don't use an XML library if you don't want to work 
with XML.

> That's true, sometimes. That inserted lump may have come from a process 
> which can only spit out perfect html fragments, in which case you're 
> fine, or it may come from user input, in which case you're doomed but 
> will likely have happy customers ;-)

the hackers will be happy, at least:

     http://en.wikipedia.org/wiki/Cross_site_scripting

</F>



More information about the XML-SIG mailing list