parsing multiple root element XML into text

Marko Rauhamaa marko at pacujo.net
Fri May 9 08:31:20 EDT 2014


Alain Ketterlin <alain at dpt-info.u-strasbg.fr>:

> Marko Rauhamaa <marko at pacujo.net> writes:
>> Sometimes the XML elements come through a pipe as an endless
>> sequence. You can still use the wrapping technique and a SAX parser.
>> However, the other option is to write a tiny XML scanner that
>> identifies the end of each element. Then, you can cut out the
>> complete XML element and hand it over to a DOM parser.
>
> Well maybe, even though I see no point in doing so. If the whole
> transaction is a single document and you need to get sub-elements on
> the fly, just use the SAX parser: there is no need to use a "tiny XML
> scanner" (whatever that is), and building a DOM for a part of the
> document in your SAX handler is easy if needed (for the OP's case a
> simple state machine would be enough, probably).

An example is <URL:
http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.

The "document" is potentially infinitely long. The elements are
messages.

The programmer would rather process the elements as DOM trees than
follow the meandering SAX parser.


Marko



More information about the Python-list mailing list