xml.sax parsing elements with the same name

Stefan Behnel stefan_ml at behnel.de
Tue Jan 12 03:13:35 EST 2010


amadain, 11.01.2010 20:13:
> I have an event log with 100s of thousands of entries with logs of the
> form:
> 
> <event eventTimestamp="2009-12-18T08:22:49.035"
> uniqueId="1261124569.35725_PFS_1_1340035961">
>    <result value="Blocked"/>
>       <filters>
>           <filter code="338" type="Filter_Name">
>               <diagnostic>
>                    <result value="Triggered"/>
>               </diagnostic>
>           </filter>
>           <filter code="338" type="Filter_Name">
>               <diagnostic>
>                    <result value="Blocked"/>
>               </diagnostic>
>           </filter>
>       </filters>
> </event>
> 
> I am using xml.sax to parse the event log.

You should give ElementTree's iterparse() a try (xml.etree package). 
Instead of a stream of simple events, it will give you a stream of 
subtrees, which are a lot easier to work with. You can intercept the event 
stream on each 'event' tag, handle it completely in one obvious code step, 
and then delete any content you are done with to safe memory.

It's also very fast, you will like not loose much performance compared to 
xml.sax.

Stefan



More information about the Python-list mailing list