[XML-SIG] elementtree and uncomplete parsing
Fredrik Lundh
fredrik at pythonware.com
Tue Jul 15 13:22:16 CEST 2008
jeanmarc.chourot at free.fr wrote:
> I would like to retrieve what is between the tags <node> ...</node> into
> strings, the "subelements" being considered as simple string and not processed
> by elelement tree.
> In other words, this could be badly formed HTML not processed embeded into well
> formed xml tags.
>
> i.e. :
> string1 = "This text <thistag> is completely crap </thistag> because
> <anothertag> blabla </anothertag>"
> string2="This is another <thisnotag> node </thisnotag> with <anothertaggy>
> random tags </anothertaggy>"
You say parse, but your description seems to say that you want to
serialize the contents of an XML node, but without getting the outermost
element. Is that correct?
In ET 1.3, you can do do this by setting the tag to None and then
serializing the node as usual, but to do this in 1.2 (as shipped with
Python 2.5), you need to process the string afterwards.
Assuming the element you want to serialize in the variable "node", you
can do:
>>> node
<Element node at c770d0>
>>> s = ET.tostring(node)
>>> s
'<node>something some other thing <tag>hello</tag> text</node>'
>>> _, _, s = s.partition(">") # chop off first tag
>>> s, _, _ = s.rpartition("<") # chop off last tag
>>> s
'something some other thing <tag>hello</tag> text'
>>>
Alternatively, you can "normalize" the node and use ordinary slicing:
>>> node.tag = "node" # make sure we know what it is
>>> node.attrib.clear()
>>> s = ET.tostring()
>>> s = ET.tostring(node)
>>> s = s[6:-7]
>>> s
'something some other thing <tag>hello</tag> text'
>>>
</F>
More information about the XML-SIG
mailing list