Large XML Document Processing

Fredrik Lundh fredrik at pythonware.com
Fri Jan 27 03:50:58 EST 2006


Albert Leibbrandt wrote:

> Just want to check which xml parser you guys have found to be the
> quickest. I have xml documents with 250 000 records or more and the
> processing of these documents are taking way to long. The validation is
> the main problem. Any module names, non validating would be find to,
> would help a lot.

if the files are regular (e.g. all records belong to the same toplevel
element), the iterparse approach is hard to beat:

    http://effbot.org/zone/element-iterparse.htm

especially if you use the cElementTree implementation:

    http://effbot.org/zone/celementtree.htm

</F>






More information about the Python-list mailing list