Python parsing XML file problem with SAX

Aahz aahz at pythoncraft.com
Mon Aug 9 19:20:33 EDT 2010


In article <mailman.1860.1281375095.1673.python-list at python.org>,
Stefan Behnel  <stefan_ml at behnel.de> wrote:
>Aahz, 09.08.2010 18:52:
>> In article<mailman.1250.1280314148.1673.python-list at python.org>,
>> Stefan Behnel wrote:
>>>
>>> First of all: don't use SAX. Use ElementTree's iterparse() function. That
>>> will shrink you code down to a simple loop in a few lines.
>>
>> Unless I'm missing something, that only helps if the final tree fits into
>> memory.  What do you suggest other than SAX if your XML file may be
>> hundreds of megabytes?
>
>Well, what about using ElementTree's iterparse() function in that case? 
>That's what it's good at, and its cElementTree version is extremely fast.

The docs say, "Parses an XML section into an element tree incrementally".
Sure sounds like it retains the entire parsed tree in RAM.  Not good.
Again, how do you parse an XML file larger than your available memory
using something other than SAX?
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box."  --Cliff Wells



More information about the Python-list mailing list