[Expat-discuss] junk after document element at line 2053

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Tue May 18 11:01:29 EDT 2004


On Tue, 18 May 2004, Karl Waclawek wrote:

>
>----- Original Message ----- 
>From: "Dan Bolser" <dmb at mrc-dunn.cam.ac.uk>
>To: "Greg Martin" <Greg.Martin at telus.com>
>Cc: <expat-discuss at libexpat.org>
>Sent: Tuesday, May 18, 2004 10:33 AM
>
>> The real snag is the multiple xml documents in each file (or is that what
>> you mean). It would be nice to be able to set a 'severity' switch, so the
>> parser keeps on going regardless.
>
>As Greg already stated, a conforming XML parser *must* report
>wll-formed ness errors, and in general, it is not possible
>to continue since a reasonable behaviour cannot always be defined.
>Examples: 
>- How should the parser continue if it encounters a start tag,
>  without an end tag? Should it ignore it? Should it read on past the
>  parent element's end tag to see if it has been misplaced?
>- How should the parser deal with an extra '<' character in the character
>  data stream? Is it the start of an element, or just a character?

I agree both these cases should not be ignored, but finding another start
tag after the 'final' end tag... why not have an option to just open it
and continue parsing? It would save me wedging my data between two rather
artificial and clumsy dummy 'outer' start and end tags. An option could be
called, 'assume outer tags' or something.

And discovering more 'prolog' data (I think it is called that), why not
have an option to re initalize the parser with this new information, or
just ignore it ...

These could be reported as 'found more prolog, assuming new document' if
the appropriate option were set.

I know the above sounds a bit strange, but I think it is quite perlish
(and therefore normal to a perl programmer) - Perhaps none of your mind
though.


>
>> One other thing, I often have to deal with character lines being
>> arbitarily broken over multiple character event calls (even when each
>> string is very short). Is there any way to reset the internal character
>> thingie to ensure this dosn't happen? Else I just use the reworked code I
>> have, building up charater data as it comes and processing on close tag
>> events.
>
>This is by design. Your current appraoch is the correct one: accumulate
>character data until the next end tag is encountered.

Thanks for the clarification. This one always trips me up!

Cheers,
Dan.


>Karl
>
>
>_______________________________________________
>Expat-discuss mailing list
>Expat-discuss at libexpat.org
>http://mail.libexpat.org/mailman/listinfo/expat-discuss
>




More information about the Expat-discuss mailing list