[Tutor] parsing XML

Alan Gauld alan.gauld at btinternet.com
Wed Nov 11 00:09:16 CET 2009


"Stefan Behnel" <stefan_ml at behnel.de> wrote

> Note that ElementTree provides both a SAX-like interface (look for the
> 'target' property of parsers) and an incremental parser (iterparse).

Interesting, I didn't realise that.
I've only ever used it to build a tree.

>> XML parsers fall into 2 groups. Those that parse the whole structure and
>> create a tree of objects - usually accessed like a dictionary, and those
>> that parse line by line looking for patterns.
>
> Except that parsing XML is not about lines but about bytes in a stream.

Indeed, I should probably have said element by element.

>> The former approach is usually slightly slower and more resource hungry
>
> I'd better leave the judgement about this statement to a benchmark.

It depends on what you are doing obviously. If you need to parse the whole
message therer will be very little difference, but a sax style parser often
can complete its job after reading a short section of the document.
Tree parsers generally require the whole document to be completed
to finish building the tree.

>> If SAX makes sense for you and meets your needs go with it.
>
> I'd change this to:
>
> Unless you really know what you are doing and you have proven in 
> benchmarks
> that SAX is substantially faster for the problem at hand, don't use SAX.

Even if speed is not the critical factor, if sax makes more sense to you
that ElementTree, and it will do what you want use sax. Theer are plenty 
industrial
strength applications using sax parsers, and if the requirement is simple 
it is
no harder to maintain than a poorly understood ElementTree implementation!

Personally I find ElementTree easier to work with, but if the OP prefers 
sax
and can make it work for him then there is nothing wrong with using it.

And familiarity with sax style parsers is arguably a more transferrable
skill than ElementTree should he need to work with C or Pascal - or
even Java etc

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/




More information about the Tutor mailing list