[XML-SIG] parsing xml files delimited with non-xml text

Daniel Veillard veillard@redhat.com
Tue, 23 Apr 2002 13:49:00 -0400


On Tue, Apr 23, 2002 at 11:39:19AM -0600, Matt Gushee wrote:
> On Tue, Apr 23, 2002 at 11:57:48AM -0500, Brian Birkinbine wrote:
> > 
> > I would prefer to use exception handling because my functions to strip out non-xml data
> > would have to recognize the start of an xml file, and the xml parser already knows
> > how to detect the start of xml data.
> 
> Not really. It *assumes* the input is well-formed XML. No XML parser I
> know of (except possibly MSXML) is designed to detect XML embedded in
> non-XML.

  Actually, the XML specification is relatively clear, the parser cannot
guess the end of the input:
    http://www.w3.org/TR/REC-xml#NT-document
    [1]    document    ::=    prolog element Misc*

 Misc* indicates that the parser cannot find by itself the end of the
content, the parser has to be informed of the end of the stream.
Anything after the root and till this point must conform to Misc*,
and if not it is actaully a fatal error.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/