parsing multiple root element XML into text

Marko Rauhamaa marko at pacujo.net
Fri May 9 06:33:37 EDT 2014


Alain Ketterlin <alain at dpt-info.u-strasbg.fr>:

> Technically speaking, this is not a well-formed XML document (it is a
> well-formed external general parsed entity, though). If you have other
> XML processors in your workflow, they will/should reject it.

Sometimes the XML elements come through a pipe as an endless sequence.
You can still use the wrapping technique and a SAX parser. However, the
other option is to write a tiny XML scanner that identifies the end of
each element. Then, you can cut out the complete XML element and hand it
over to a DOM parser.

Such a scanner can be really small and nonrecursive because of the
welformedness rules of XML.


Marko



More information about the Python-list mailing list