Problem with processing XML

John Carlyle-Clarke jpcc at nowhere.org
Tue Jan 22 09:11:54 EST 2008


Hi.

I'm new to Python and trying to use it to solve a specific problem.  I 
have an XML file in which I need to locate a specific text node and 
replace the contents with some other text.  The text in question is 
actually about 70k of base64 encoded data.

I wrote some code that works on my Linux box using xml.dom.minidom, but 
it will not run on the windows box that I really need it on.  Python 
2.5.1 on both.

On the windows machine, it's a clean install of the Python .msi from 
python.org.  The linux box is Ubuntu 7.10, which has some Python XML 
packages installed which can't easily be removed (namely  python-libxml2 
and python-xml).

I have boiled the code down to its simplest form which shows the problem:-

import xml.dom.minidom
import sys

input_file = sys.argv[1];
output_file = sys.argv[2];

doc = xml.dom.minidom.parse(input_file)
file = open(output_file, "w")
doc.writexml(file)

The error is:-

$ python test2.py input2.xml output.xml
Traceback (most recent call last):
   File "test2.py", line 9, in <module>
     doc.writexml(file)
   File "c:\Python25\lib\xml\dom\minidom.py", line 1744, in writexml
     node.writexml(writer, indent, addindent, newl)
   File "c:\Python25\lib\xml\dom\minidom.py", line 814, in writexml
     node.writexml(writer,indent+addindent,addindent,newl)
   File "c:\Python25\lib\xml\dom\minidom.py", line 809, in writexml
     _write_data(writer, attrs[a_name].value)
   File "c:\Python25\lib\xml\dom\minidom.py", line 299, in _write_data
     data = data.replace("&", "&").replace("<", "<")
AttributeError: 'NoneType' object has no attribute 'replace'

As I said, this code runs fine on the Ubuntu box.  If I could work out 
why the code runs on this box, that would help because then I call set 
up the windows box the same way.

The input file contains an <xsd:schema> block which is what actually 
causes the problem.  If you remove that node and subnodes, it works 
fine.  For a while at least, you can view the input file at 
http://rafb.net/p/5R1JlW12.html

Someone suggested that I should try xml.etree.ElementTree, however 
writing the same type of simple code to import and then write the file 
mangles the xsd:schema stuff because ElementTree does not understand 
namespaces.

By the way, is pyxml a live project or not?  Should it still be used? 
It's odd that if you go to http://www.python.org/ and click the link 
"Using python for..." XML, it leads you to 
http://pyxml.sourceforge.net/topics/

If you then follow the download links to 
http://sourceforge.net/project/showfiles.php?group_id=6473 you see that 
the latest file is 2004, and there are no versions for newer pythons. 
It also says "PyXML is no longer maintained".  Shouldn't the link be 
removed from python.org?

Thanks in advance!



More information about the Python-list mailing list