Finding all instances of a string in an XML file
Peter Otten
__peter__ at web.de
Fri Jun 21 02:16:00 EDT 2013
Jason Friedman wrote:
> I have XML which looks like:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE KMART SYSTEM "my.dtd">
> <LEVEL_1>
> <LEVEL_2 ATTR="hello">
> <ATTRIBUTE NAME="Property X" VALUE ="2"/>
> </LEVEL_2>
> <LEVEL_2 ATTR="goodbye">
> <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
> <LEVEL_3 ATTR="aloha">
> <ATTRIBUTE NAME="Property X" VALUE ="3"/>
> </LEVEL_3>
> <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
> </LEVEL_2>
> </LEVEL_1>
>
> The "Property X" string appears twice times and I want to output the
> "path"
> that leads to all such appearances. In this case the output would be:
>
> LEVEL_1 {}, LEVEL_2 {"ATTR": "hello"}, ATTRIBUTE {"NAME": "Property X",
> "VALUE": "2"}
> LEVEL_1 {}, LEVEL_2 {"ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha"},
> ATTRIBUTE {"NAME": "Property X", "VALUE": "3"}
>
> My actual XML file is 2000 lines and contains up to 8 levels of nesting.
That's still small, so
xml = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE KMART SYSTEM "my.dtd">
<LEVEL_1>
<LEVEL_2 ATTR="hello">
<ATTRIBUTE NAME="Property X" VALUE ="2"/>
</LEVEL_2>
<LEVEL_2 ATTR="goodbye">
<ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
<LEVEL_3 ATTR="aloha">
<ATTRIBUTE NAME="Property X" VALUE ="3"/>
</LEVEL_3>
<ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
</LEVEL_2>
</LEVEL_1>
"""
import xml.etree.ElementTree as etree
tree = etree.fromstring(xml)
def walk(elem, path, token):
path += (elem,)
if token in elem.attrib.values():
yield path
for child in elem.getchildren():
for match in walk(child, path, token):
yield match
for path in walk(tree, (), "Property X"):
print(", ".join("{} {}".format(elem.tag, elem.attrib) for elem in path))
More information about the Python-list
mailing list