lxml and xpath(?)

Peter Otten __peter__ at web.de
Wed Oct 26 09:56:42 EDT 2016


Doug OLeary wrote:

> Hey;
> 
> Reasonably new to python and incredibly new to xml much less trying to
> parse it. I need to identify cluster nodes from a series of weblogic xml
> configuration files. I've figured out how to get 75% of them; now, I'm
> going after the edge case and I'm unsure how to proceed.
> 
> Weblogic xml config files start with namespace definitions then a number
> of child elements some of which have children of their own.
> 
> The element that I'm interested in is <server> which will usually have a
> subelement called <listen-address> containing the hostname that I'm
> looking for.
> 
> Following the paradigm of "we love standards, we got lots of them", this
> model doesn't work everywhere. Where it doesn't work, I need to look for a
> subelement of <server> called <machine>. That element contains an alias
> which is expanded in a different root child, at the same level as
> <server>.
> 
> So, picture worth a 1000 words:
> 
> <?xml version='1.0' encoding='UTF-8'?>
> < [[ heinous namespace xml snipped ]] >
>    <name>[[text]]</name>
>    ...
>    <server>
>       <name>EDIServices_MS1</name>
>       ...
>       <machine>EDIServices_MC1</machine>
>       ...
>    </server>
>    <server>
>       <name>EDIServices_MS2</name>
>       ...
>       <machine>EDIServices_MC2</machine>
>       ...
>    </server>
>    <machine xsi:type="unix-machineType">
>      <name>EDIServices_MC1</name>
>      <node-manager>
>        <name>EDIServices_MC1</name>
>        <nm-type>SSL</nm-type>
>        <listen-address>host001</listen-address>
>        <listen-port>7001</listen-port>
>      </node-manager>
>    </machine>
>    <machine xsi:type="unix-machineType">
>      <name>EDIServices_MC2</name>
>      <node-manager>
>        <name>EDIServices_MC2</name>
>        <listen-address>host002</listen-address>
>        <listen-port>7001</listen-port>
>      </node-manager>
>    </machine>
> </domain>
> 
> So, running it on 'normal' config, I get:
> 
> $ ./lxml configs/EntsvcSoa_Domain_config.xml
> EntsvcSoa_CS    => host003.myco.com
> EntsvcSoa_CS   => host004.myco.com
> 
> Running it against the abi-normal config, I'm currently getting:
> 
> $ ./lxml configs/EDIServices_Domain_config.xml
> EDIServices_CS => EDIServices_MC1
> EDIServices_CS => EDIServices_MC2
> 
> Using the examples above, I would like to translate EDIServices_MC1 and
> EDIServices_MC2 to host001 and host002 respectively.
> 
> The primary loop is:
> 
> for server in root.findall('ns:server', namespaces):
>   cs = server.find('ns:cluster', namespaces)
>   if cs is None:
>     continue
>   # cluster_name = server.find('ns:cluster', namespaces).text
>   cluster_name = cs.text
>   listen_address = server.find('ns:listen-address', namespaces)
>   server_name = listen_address.text
>   if server_name is None:
>     machine = server.find('ns:machine', namespaces)
>     if machine is None:
>       continue
>     else:
>       server_name = machine.text
> 
>   print("%-15s => %s" % (cluster_name, server_name))
> 
> (it's taken me days to write 12 lines of code... good thing I don't do
> this for a living :) )

You tend to get more efficient when you read the tutorial before you start 
writing code. Hard-won advice that I still not always follow myself ;)

> 
> Rephrased, I need to find the <listen-address> under the <machine> child
> who's name matches the name under the corresponding <server> child. From
> some of the examples on the web, I believe xpath might help but I've not
> been able to get even the simple examples working. Go figure, I just
> figured out what a namespace is...
> 
> Any hints/tips/suggestions greatly appreciated especially with complete
> noob tutorials for xpath.

Use your favourite search engine. One advantage of XPath is that it's not 
limited to Python.

I did not completely follow your question, so the example below is my 
interpretation of what you are asking for. It may still help you get 
started...

$ cat lxml_translate_host.py
from lxml import etree

s = """\
<?xml version='1.0' encoding='UTF-8'?>
<domain>
   <name>text</name>
   <server>
      <name>EDIServices_MS1</name>
      <machine>EDIServices_MC1</machine>
   </server>
   <server>
      <name>EDIServices_MS2</name>
      <machine>EDIServices_MC2</machine>
   </server>
   <machine type="unix-machineType">
     <name>EDIServices_MC1</name>
     <node-manager>
       <name>EDIServices_MC1</name>
       <nm-type>SSL</nm-type>
       <listen-address>host001</listen-address>
       <listen-port>7001</listen-port>
     </node-manager>
   </machine>
   <machine type="unix-machineType">
     <name>EDIServices_MC2</name>
     <node-manager>
       <name>EDIServices_MC2</name>
       <listen-address>host002</listen-address>
       <listen-port>7001</listen-port>
     </node-manager>
   </machine>
</domain>
""".encode()

root = etree.fromstring(s)
for server in root.xpath("./server"):
    servername = server.xpath("./name/text()")[0]
    print("server", servername)
    if not servername.isidentifier():
        raise ValueError("Kind regards to Bobby Tables' Mom")
    machine = server.xpath("./machine/text()")[0]
    print("machine", machine)
    path = ("../machine[name='{}']/node-manager/"
            "listen-address/text()").format(machine)
    host = server.xpath(path)[0]
    print("host", host)
    print()
$ python3 lxml_translate_host.py 
server EDIServices_MS1
machine EDIServices_MC1
host host001

server EDIServices_MS2
machine EDIServices_MC2
host host002

$





More information about the Python-list mailing list