Large XML Document Processing

uche.ogbuji at gmail.com uche.ogbuji at gmail.com
Tue Feb 7 13:02:07 EST 2006


Albert Leibbrandt wrote:
> Hi
>
> Just want to check which xml parser you guys have found to be the
> quickest. I have xml documents with 250 000 records or more and the
> processing of these documents are taking way to long. The validation is
> the main problem. Any module names, non validating would be find to,
> would help a lot.

It would help us help you if you posted samples of the target docs.
XML processing strategy often depends on the structure of the XML, just
as relational query optimization strategy often depends on the schema.
In general SAX or iterative tree-callback methods will give you the
best speed.  Fredrik already mentioned ElementTree's IterParse.
Amara's pushbind and pushdom and 4Suite's Saxlette (which has some neat
callback features) are other options.

http://uche.ogbuji.net/tech/4suite/amara/
http://4suite.org/docs/CoreManual.xml#saxlette

--
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/




More information about the Python-list mailing list