parsing multiple root element XML into text

Alain Ketterlin alain at dpt-info.u-strasbg.fr
Fri May 9 08:01:43 EDT 2014


Marko Rauhamaa <marko at pacujo.net> writes:

> Alain Ketterlin <alain at dpt-info.u-strasbg.fr>:
>
>> Technically speaking, this is not a well-formed XML document (it is a
>> well-formed external general parsed entity, though). If you have other
>> XML processors in your workflow, they will/should reject it.
>
> Sometimes the XML elements come through a pipe as an endless sequence.
> You can still use the wrapping technique and a SAX parser. However, the
> other option is to write a tiny XML scanner that identifies the end of
> each element. Then, you can cut out the complete XML element and hand it
> over to a DOM parser.

Well maybe, even though I see no point in doing so. If the whole
transaction is a single document and you need to get sub-elements on the
fly, just use the SAX parser: there is no need to use a "tiny XML
scanner" (whatever that is), and building a DOM for a part of the
document in your SAX handler is easy if needed (for the OP's case a
simple state machine would be enough, probably).

-- Alain.



More information about the Python-list mailing list