[XML-SIG] parsing xml files delimited with non-xml text
Andy Robinson
andy@reportlab.com
Tue, 23 Apr 2002 23:18:32 +0100
> Should I strip out the non-xml data separately into xml-compliant
> pieces before
> calling the parse routine, or can I use exception handling within
> the xml routines
> to ignore the non-xml data until I see valid xml data?
Does the non-xml data consist of HTML tags (i.e. you
have XML chunks embedded in web pages), or totally
unrelated stuff like
================xml begins here=============
?
If the latter, a pragmatic approach says that string.split,
re and friends will do a pretty good job of cutting
it up. If the former, I see the temptation to try and
get away with a single parser, but you may be better
using sgmlop or another non-xml parser to break things
into chunks. HTML parsers don't choke on singleton
tags, missing quotes and other things.
Show us all a snippet and we'll be able to tell you the
most pragmatic route!
- Andy Robinson