XML DOM: XML/XHTML inside a text node

Alan Kennedy alanmk at hotmail.com
Fri Nov 4 14:45:07 EST 2005


[noahlt at gmail.com]
> In my program, I get input from the user and insert it into an XHTML
> document.  Sometimes, this input will contain XHTML, but since I'm
> inserting it as a text node, xml.dom.minidom escapes the angle brackets
> ('<' becomes '<', '>' becomes '>').  I want to be able to
> override this behavior cleanly.

Why?

You need to make a decision on how the contained xhtml is treated after 
it has been inserted into the document.

1. If it is simply textual payload, then it should be perfectly 
acceptable to escape those characters. Or you could include it as a 
CDATA section.

2. If it needs to become a structural part of the xml document, i.e. the 
elements are structurally incorporated into the document, then you need 
to transform it into nodes somehow, e.g. by parsing it with sax, etc. 
Although it would probably be easier to parse it into a separate DOM and 
import the generated root node into your document.

Is this xhtml coming from a trusted source? Or are you accepting it from 
strangers, over the internet? If the latter, there are security concerns 
relating to XSS attacks that you need to be aware of.

See the following archive post for how to clean up untrusted (x)html.

http://groups.google.com/group/comp.lang.python/browse_thread/thread/fbdc7ae20353a36d/91b6510990a25f9a

HTH,

-- 
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan



More information about the Python-list mailing list