[XML-SIG] parsing xml files delimited with non-xml text

Matt Gushee Matt Gushee <mgushee@havenrock.com>
Tue, 23 Apr 2002 11:39:19 -0600


On Tue, Apr 23, 2002 at 11:57:48AM -0500, Brian Birkinbine wrote:
> 
> I would prefer to use exception handling because my functions to strip out non-xml data
> would have to recognize the start of an xml file, and the xml parser already knows
> how to detect the start of xml data.

Not really. It *assumes* the input is well-formed XML. No XML parser I
know of (except possibly MSXML) is designed to detect XML embedded in
non-XML.

More to the point, I have two thoughts on your approach. One, I am
philosophically opposed to it because I think exception handling is 
called that for a reason: it is intended for exceptional cases. But 
that's just me (and some authors of books about good programming
practices).

In practical terms, I'm not familiar enough w/ the internals of the 
Python SAX parser to be sure, but the way things normally work, once
non-XML is found in the input, you don't get a second chance. So I think
you would have to have some logic that iterates over the input lines, 
repeatedly attempting to start parsing until no exception is raised.

-- 
Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com
http://www.havenrock.com/