xml processing : too slow...

Fredrik Lundh fredrik at pythonware.com
Wed Jul 24 14:11:27 EDT 2002


"Shagshag13"

> * check well formedness
> * process each line which are like this:
> <tag0><tag1> 1 2 </tag1><tag2 attr="value">3</tag2></tag0>
> -> to have:
> ['<tag0>', '<tag1>', '1', '2', '</tag1>', '<tag2 attr="value">', '3', '</tag2>', '</tag0>']

from xml.parsers import expat
parser = expat.ParserCreate(None, None)

import re
p = re.compile("<[^>]*>|\d+")

for line in file:
    # check wellformedness
    parser.Parse(line, 0)
    # split into parts
    print p.findall()

# check for trailing junk
parser.Parse("", 1)

</F>





More information about the Python-list mailing list