DOM - some pointers

Andrew Dalke dalke at dalkescientific.com
Tue Dec 18 18:10:39 EST 2001


infotechsys.wayne at verizon.net:
>Could someone point me to some documentation that show how to use
>HTML ,Dom and Python together. I did a google search, but the only
>thing I find is DOM, XML and Python.

I confess to being confused as well.  I give here an example
of what I want to do and how I thought to do it.

I have an /etc/passwd-like XML format like this

<passwd>
  <entry>
   <account>dalke</account>
   <password>*</password>
   < .... >
   <shell>/bin/tcsh</shell>
  </entry>
  <entry>
   <account>root</account>
  ....
</passwd>

I want to change my shell entry to /bin/bash.  I tried
the following (with Python 2.0, but I doubt 2.2 has changed
things):

>>> from xml.dom import minidom
>>> doc = minidom.parseString("<passwd><entry>"
...                  "<account>dalke</account>"
...                  "<shell>/bin/tcsh</shell>"
...                  "</entry></passwd>")
>>> doc.normalize()
>>> for entry in doc.getElementsByTagName("entry"):
...    account = entry.getElementsByTagName("account")[0]
...    if account.firstChild.nodeValue == "dalke":
...        shell = entry.getElementsByTagName("shell")[0]
...        shell.firstChild.nodeValue = u"/bin/bash"
...        break
... else:
...    print "dalke not found"
...
>>> doc.toxml()
u'<passwd><entry><account>dalke</account><shell>/bin/tcsh</shell>
</entry></passwd>'
>>> shell
<DOM Element: shell at 4836174280>
>>> shell.firstChild.nodeValue
u'/bin/bash'
>>> shell.firstChild.data = u"/bin/bash"
>>> doc.toxml()
u'<passwd><entry><account>dalke</account><shell>/bin/bash</shell>
</entry></passwd>'

My questions are:
  1) why does it take so much work to do this?
  2) why doesn't the XML output contain the new shell name when
       I change "nodeValue"?
  3) why does the XML output change when I change 'data' -- and
       is that the right way to change the value?
  4) is there any way to dump just the raw characters as text
      (not in XML)?   How?

I would prefer an API which is more like

for entry in doc["entry"]:
  if entry["account"][0].text == "dalke":
    entry["shell"][0].text = "/bin/bash"
    break

and not have to worry about the normalization and explicit
use of firstChild.

And I haven't seen any documentation which introduces Python
programmers to using DOM (like AMK had for SAX parsing) -
only docs for people who already know DOM from Java or other
fields.

So I too am looking for pointers.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list