parsing multiple root element XML into text

Alain Ketterlin alain at dpt-info.u-strasbg.fr
Fri May 9 11:50:19 EDT 2014


Marko Rauhamaa <marko at pacujo.net> writes:

> Alain Ketterlin <alain at dpt-info.u-strasbg.fr>:
>
>> Marko Rauhamaa <marko at pacujo.net> writes:
>>> Sometimes the XML elements come through a pipe as an endless
>>> sequence. You can still use the wrapping technique and a SAX parser.
>>> However, the other option is to write a tiny XML scanner that
>>> identifies the end of each element. Then, you can cut out the
>>> complete XML element and hand it over to a DOM parser.
>>
>> Well maybe, even though I see no point in doing so. If the whole
>> transaction is a single document and you need to get sub-elements on
>> the fly, just use the SAX parser: there is no need to use a "tiny XML
>> scanner" (whatever that is), and building a DOM for a part of the
>> document in your SAX handler is easy if needed (for the OP's case a
>> simple state machine would be enough, probably).
>
> An example is <URL:
> http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.
>
> The "document" is potentially infinitely long. The elements are
> messages.
>
> The programmer would rather process the elements as DOM trees than
> follow the meandering SAX parser.

which does an exact traversal of potential the DOM tree... (assuming a
DOM is even defined on a non well-formed XML document).

Anyway, my point was only to warn the OP that he is not doing XML.

-- Alain.



More information about the Python-list mailing list