[Expat-discuss] interface to XML_GetInputContext

Michael Isard michael.isard@compaq.com
Wed, 20 Jun 2001 10:57:51 -0700 (PDT)


I have a request for strengthening the interface to
XML_GetInputContext. The current comment in expat.h reads:

/* If XML_CONTEXT_BYTES is defined, returns the input buffer, sets
   the integer pointed to by offset to the offset within this buffer
   of the current parse position, and sets the integer pointed to by size
   to the size of this buffer (the number of input bytes). Otherwise
   returns a null pointer. Also returns a null pointer if a parse isn't active.

   NOTE: The character pointer returned should not be used outside
   the handler that makes the call. */


First I would like it to guarantee that
  (size-offset) >= XML_GetCurrentByteCount()
i.e. that the current event is contained within the buffer. Without
this guarantee the client has to write a bunch of code to save partial 
data.

Second I would like it to guarantee that the last character in the
returned buffer corresponds to the last character passed to the most
recent call of XML_Parse or XML_ParseBuffer (not that it has the same
memory location necessarily, just that all the data which has been
passed to the parser since the start of the current event is visible
in the buffer).

As far as I can tell by reading the source both are true in the
current implementation but I would be very grateful to be corrected if
I have misread it.

I do have a reason for the second request... I am writing a routine
which streams an XML document rewriting some characters near the
beginning but leaving the bulk untouched. Therefore I would like to be
able to detect in a callback that I have finished all the rewriting I
am going to do, and simply copy the rest of the file rather than
parsing it. I am currently implementing this strategy as follows:


...

void final_callback(void *ud)
{
  mydata d = (mydata) ud;
  int offset, size;
  char *buf;

  buf = XML_GetInputContext(d->parser, &offset, &size);
  d->finished = TRUE;
  d->write(buf + offset, size - offset);
}

...

  /* main parse loop */
  while (!(d->finished)) {
    buf = XML_GetBuffer(d->parser, bufsize);
    n = d->read(buf, bufsize);
    XML_ParseBuffer(d->parser, n, 0);
  }
  while (n > 0) {
    n = d->read(static_buf, bufsize);
    d->write(static_buf, n);
  }

...


but this will only work if the second condition above is satisfied. Do
you have any advice for a better way to do this if you don't want to
change the interface?

Thanks,
Michael Isard.