Trying to parse a HUGE(1gb) xml file

Adam Tauno Williams awilliam at whitemice.org
Mon Dec 20 15:33:45 EST 2010


On Mon, 2010-12-20 at 12:29 -0800, spaceman-spiff wrote:
> I need to detect them & then for each 1, i need to copy all the
> content b/w the element's start & end tags & create a smaller xml
> file.

Yep, do that a lot; via iterparse.

> 1. Can you point me to some examples/samples of using SAX,
> especially , ones dealing with really large XML files.

SaX is equivalent to iterparse (iterpase is a way, to essentially, do
SaX-like processing).

I provided an iterparse example already. See the Read_Rows method in 
<http://coils.hg.sourceforge.net/hgweb/coils/coils/file/62335a211fda/src/coils/foundation/standard_xml.py>

> 2.This brings me to another q. which i forgot to ask in my OP(original post).
> Is simply opening the file, & using reg ex to look for the element i need, a *good* approach ?

No.





More information about the Python-list mailing list