[Expat-discuss] & symbol workaround

Brad Causey bradcausey at gmail.com
Wed Feb 4 20:56:11 CET 2009


Hi list,

I am working on a Python script that parses around 6800 small xml files.
My code isn't pretty, as I am just testing a PoC at this point, but I have
run into a problem. When the script hits the Ampersand symbol, it quits with
"xml.parsers.expat.ExpatError: not well-formed (invalid token): line 28,
column 41"

I am trying to figure out a way to work around this without modifying the
XML files themselves as these need to be preserved in the original format.

Here is my code:
<begin code>
import xml.parsers.expat
import string
import os

#var setup
list = []
values = []
indexy =
('RulesVersion','AuditDate','ComputerName','UserName','UserDomain','OSName','OSServicePack','OSBuild','AntiVirusProduct','ExeVersion','SigsVersion','Active','Timeout','PasswordRequired','PasswordLength','Modem','Dialtone')
out = open('test.txt','w')

#handler functions
def start_element(name, attrs):
    name = str(name)
    list.append(name)
def end_element(name):
    name = str(name)
    list.append(name)
def char_data(data):
    data = str(data)
    list.append(data)

#file parsing
xlist = os.popen (r"dir /od /a-d /b *.xml").read ().splitlines ()
for i in xlist:
    print i
    p = xml.parsers.expat.ParserCreate('ASCII')
    p.StartElementHandler = start_element
    p.EndElementHandler = end_element
    p.CharacterDataHandler = char_data
    values.append(i)
    file = open(i,'r')
    p.ParseFile(file)
    for item in indexy:
        check = item
        try:
            item = list.index(item)
            if check == 'AntiVirusProduct':
                values.append(list[item+3])
            elif check == 'Modem':
                values.append(list[item+3])
            else:
                values.append(list[item+1])
        except:
            values.append('NOT FOUND')
    file.close()
    print values
    list =[]
    values =[]
<end code>

-B


More information about the Expat-discuss mailing list