[Expat-discuss] Stopping the parse -- anybody home?

David Crowley dcrowley@scitegic.com
Wed, 18 Apr 2001 11:27:29 -0700


At 10:38 AM 4/18/2001, Fred L. Drake, Jr. wrote:

>Michael Roberts writes:
>  > It did indeed make it to the list and I was kind of hoping somebody would
>  > answer it.
>
>   Looks like our responses crossed in the mail!
>
>  > You might just keep a flag attached to the parse, and skip out of all
>  > handlers when it gets set.  That's the approach I'd try first.
>
>   Here's a (slightly) better approach that we use in the Python
>bindings for Expat:  when a Python handler raises an exception, we
>clear all the handlers registered with the parser instance being used.
>This avoids having to check a flag for each callback (which gives us
>more maintainable application code), and can be just a little faster.


I actually tried to respond last weekend but my mail bounced and I didn't 
get back to it.  The situation I am in is I need to break out of a parse 
and then continue at a later time.  So I set up a wrapper class around my 
file to read the file and return "tokens" where I say a "token" is anything 
before a ">" character.   So my loop is like this:

bool stopParse = false;
tokenizer t("myfile.xml");

while (1)
{
    void *buffer = XML_GetParseBufffer(parser, 1024)
    int read = t.readToken(buffer, 1024);
    XML_ParseBuffer(parser, read, read == 0);
    if (stopParse || read == 0)
       break;
}

void
endElementHandler(...)
{
    if (needToStop)
       stopParse = true;
}


The tokens returned for an xml file of "<foo><bar>data</bar></foo>" are 
"<foo>", "<bar>", "data</bar>", and "</foo>."  I guess you could also write 
the tokenizer to break it up a little bit more to break up the "data</bar>" 
token.  But thats the general idea.  The Xerces parser kind of does 
something like that with the tokens, but I MUCH prefer Expat.


David