[XML-SIG] elementtree and uncomplete parsing

Fredrik Lundh fredrik at pythonware.com
Tue Jul 15 13:22:16 CEST 2008


jeanmarc.chourot at free.fr wrote:

> I would like to retrieve what is between the tags <node> ...</node> into
> strings, the "subelements" being considered as simple string and not processed
> by elelement tree.
> In other words, this could be badly formed HTML  not processed embeded into well
> formed xml tags.
> 
> i.e. :
> string1 = "This text <thistag> is completely crap </thistag> because
> <anothertag> blabla </anothertag>"
> string2="This is another <thisnotag> node </thisnotag> with <anothertaggy>
> random tags </anothertaggy>"

You say parse, but your description seems to say that you want to 
serialize the contents of an XML node, but without getting the outermost 
element.  Is that correct?

In ET 1.3, you can do do this by setting the tag to None and then 
serializing the node as usual, but to do this in 1.2 (as shipped with 
Python 2.5), you need to process the string afterwards.

Assuming the element you want to serialize in the variable "node", you 
can do:

 >>> node
<Element node at c770d0>
 >>> s = ET.tostring(node)
 >>> s
'<node>something some other thing <tag>hello</tag> text</node>'
 >>> _, _, s = s.partition(">") # chop off first tag
 >>> s, _, _ = s.rpartition("<") # chop off last tag
 >>> s
'something some other thing <tag>hello</tag> text'
 >>>

Alternatively, you can "normalize" the node and use ordinary slicing:

 >>> node.tag = "node" # make sure we know what it is
 >>> node.attrib.clear()
 >>> s = ET.tostring()
 >>> s = ET.tostring(node)
 >>> s = s[6:-7]
 >>> s
'something some other thing <tag>hello</tag> text'
 >>>

</F>



More information about the XML-SIG mailing list