Elementtree and CDATA handling

alainpoint at yahoo.fr alainpoint at yahoo.fr
Wed Jun 1 09:06:42 EDT 2005


I am experimenting with ElementTree and i came accross some
(apparently) weird behaviour.
I would expect a piece of XML to be read, parsed and written back
without corruption (except for the comments and PI which have purposely
been left out). It isn't however the case when it comes to CDATA
handling.
I have the following code:
text="""<html><head>
<title>Document</title>
</head>
<body>
<script type="text/javascript">
//<![CDATA[
function matchwo(a,b)
{
if (a < b && a > 0) then
   {
   return 1
   }
}
//]]>
</script>
</body>
</html>
"""

from elementtree import ElementTree
tree = ElementTree.fromstring(text)
ElementTree.dump(tree)

Running the above piece of code yields the following:

<html><head>
<title>Document</title>
</head>
<body>
<script type="text/javascript">
//
function matchwo(a,b)
{
if (a < b && a > 0) then
   {
   return 1
   }
}
//
</script>
</body>
</html>

There are two problems: the //<![CDATA[  has disappeared and the <, >
and && have been replaced by their equivalent entities (CDATA should
have prevented that).
I am no XML/HMTL expert, so i might be doing something wrong...
Thank you for helping

Alain




More information about the Python-list mailing list