XML expat error

dirkheld dirkheld at gmail.com
Wed Feb 27 10:26:05 EST 2008


Hi,

I have written a piece of code that reads all xml files in a directory
in onder to retrieve one element in each of these files. All files
have the same XML structure. After file 123 I receive the following
error :

xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20

I guess that the element I try to read or the XML(which would be
strange since they have been created with the same code) can't ben
retrieved.

Is there a way to :
1. fix this problems so that I can retrieve it
2. is there a way that after such an error the invalid file is being
skipped and the program continues with reading the subsequent files;
Some sort of error handling?

Here is the code I use :

from xml.dom import minidom
import os
path = "/Documents/programming/data/xml/"


dirList = os.listdir(path)
url_file=open('/Documents/programming/data/xml/test.txt','w')
for file in dirList:
	xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
	xml_elem = xmldoc.getElementsByTagName('webpage')
	web_elem = xml_elem[0]
	url = web_elem.attributes['uri']
	url_file.write(url.value + '\n')
url_file.close()



More information about the Python-list mailing list