Trying to parse a HUGE(1gb) xml file

spaceman-spiff ashish.makani at gmail.com
Mon Dec 20 15:29:01 EST 2010


Hi Usernet

First up, thanks for your prompt reply.
I will make sure i read RFC1855, before posting again, but right now chasing a hard deadline :)

I am sorry i left out what exactly i am trying to do.

0. Goal :I am looking for a specific element..there are several 10s/100s occurrences of that element in the 1gb xml file.
The contents of the xml, is just a dump of config parameters from a packet switch( although imho, the contents of the xml dont matter)

I need to detect them & then for each 1, i need to copy all the content b/w the element's start & end tags & create a smaller xml file.

1. Can you point me to some examples/samples of using SAX, especially , ones dealing with really large XML files.

2.This brings me to another q. which i forgot to ask in my OP(original post).
Is simply opening the file, & using reg ex to look for the element i need, a *good* approach ?
While researching my problem, some article seemed to advise against this, especially since its known apriori, that the file is an xml & since regex code gets complicated very quickly & is not very readable.

But is that just a "style"/"elegance" issue, & for my particular problem (detecting a certain element, & then creating(writing) a smaller xml file corresponding to, each pair of start & end tags of said element), is the open file & regex approach, something you would recommend ?

Thanks again for your super-prompt response :)

cheers
ashish



More information about the Python-list mailing list