[XML-SIG] Proposed Expat API changes

Mike Olson Mike.Olson@fourthought.com
08 Aug 2002 12:52:14 -0600


On Thu, 2002-08-08 at 12:46, Fred L. Drake, Jr. wrote:

I think option 1 is the best choice.  It will not break code unless
someone goes in and adds calls to suspend the parser.  As mentioned,
this would break with new return values to XML_Parse, etc. however, if
they are in there making changes might as well change two places.

Mike


> 
> I've proposed some changes to Expat's C API on the expat-discuss list;
> these changes would allow pull-based and mixed-mode parsers to be
> built on top of Expat.
> 
> Unfortunately, the message hasn't appeared in the online archives;
> this is the cost of using SF's mailing lists.  ;-(  I've attached the
> proposal to this email, in case anyone is interested.  Followups
> pertaining to Expat's C API should be directed to the expat-discuss
> list:
> 
>         http://sourceforge.net/mail/?group_id=10127
> 
> 
>   -Fred
> 
> -- 
> Fred L. Drake, Jr.  <fdrake at acm.org>
> PythonLabs at Zope Corporation
> 
> ----
> 

> Implementing a blocking mode in Expat
> =====================================
> 
> Requests for a pull-based API for Expat have surfaced a few times over
> (at least) the last couple of years; there is a feature request for
> this on SourceForge (issue #544682):
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=544682&group_id=10127&atid=110127
> 
> An additional motivation is that we'd like to be able to share a
> codebase with the Mozilla project, which is currently using a
> substantially modified version of an older version of Expat.
> 
> Pull-based parsers have become increasingly popular as the limitations
> of DOM- or SAX-like APIs have become better known.  The pull-based
> APIs provide an opportunity to build each part of an application in
> the way that's most appropriate, allowing a mixture of DOM- and
> SAX-like behaviors.
> 
> Expat could provide the basis for an efficient pull-based API if it
> offered an opportunity to suspend parsing temporarily, allowing
> parsing to resume when the application is ready for additional
> information from the document.  A .NET-like API could easily be built
> on top of such a feature.
> 
> Karl Waclawek and I have been having discussions about this, and think
> we have a good idea of how to introduce such a feature into Expat.
> There are questions and issues regarding the possible API that would
> need to be exposed; I've summarized our ideas an analysis below in the
> form of two alternate API proposals.
> 
> We welcome feedback and discussion, including the introduction of
> additional API proposals, on the expat-discuss list.
> 
> 
> Supporting Information
> ----------------------
> 
> Expat 1.95.6 / 1.96 will include a new enumeration, XML_Status,
> specifying return values for the XML_Parse() and XML_ParseBuffer()
> functions.  Our recommendation is that the result of XML_Parse() and
> XML_ParseBuffer() be tested for these values specifically, even when
> using older versions of Expat 1.95.x -- this will be completely
> equivalent in practice.  This change allows us to extend the number of
> possible return values in the future; the documented API in Expat 1.95
> through 1.95.4 really only defines a boolean interpretation of these
> return values, but only the two specific values, now named by
> XML_Status enum names, were actually used.
> 
> 
> API Option 1
> ------------
> 
> This alternative introduces two new functions and three new constants.
> These are only needed if an application uses the new functionality.
> 
> XML_STATUS_SUSPENDED
> 
>     New value in the XML_Status enumeration.  This is only used if
>     XML_SuspendParser() has been called.
> 
> XML_ERROR_NOT_SUSPENDED
> XML_ERROR_SUSPENDED
> 
>     These new error codes would be used to indicate that a call to the
>     parser was made when the parser was not in the expected internal
>     state, and indicate programming errors in the application.
> 
> XML_Status
> XML_SuspendParser(XML_Parser parser)
> 
>     Inform the parser that parsing should be suspended when the
>     currently active callback returns.  It should only be called from
>     a callback.  Returns XML_STATUS_OK or XML_STATUS_ERROR.  Multiple
>     calls to XML_SuspendParser() during a callback are allowed, and
>     are equivalent to a single call to XML_SuspendParser().  It is an
>     error to call this function while a callback function is not
>     active.
> 
> XML_Status
> XML_ResumeParser(XML_Parser parser)
> 
>     Resume parsing using a suspended parser.  Returns XML_STATUS_OK,
>     XML_STATUS_ERROR, or XML_STATUS_SUSPENDED.  If the parser has not
>     been suspended, this returns XML_STATUS_ERROR, and
>     XML_GetErrorCode() returns XML_ERROR_NOT_SUSPENDED.  The parser is
>     not invalidated in this case, and parsing may be continued with
>     additional input using XML_Parse() or XML_ParseBuffer().
> 
> The following functions change:
> 
> XML_Status
> XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)
> 
> XML_Status
> XML_ParseBuffer(XML_Parser parser, int len, int isFinal)
> 
>     These two existing functions will change the meaning of their
>     return value slightly.  If parsing is suspended using
>     XML_SuspendParser(), they will return XML_STATUS_SUSPENDED,
>     otherwise the current values of XML_STATUS_OK and XML_STATUS_ERROR
>     may be returned.
> 
>     If XML_STATUS_SUSPENDED is returned, the parse of the input
>     document can only be resumed using XML_ResumeParser().  If either
>     of these is called on a suspended parser, XML_ERROR_OK will be
>     returned with the error code XML_ERROR_SUSPENDED returned by
>     XML_GetErrorCode().  The parser is not invalidated in this case,
>     and parsing may still be resumed.
> 
> void *
> XML_GetBuffer(XML_Parser parser, int len)
> 
>     If the parser has been suspended, returns NULL and
>     XML_GetErrorCode() returns XML_ERROR_SUSPENDED.  Parsing the input
>     which has already been passed into Expat should be continued using
>     XML_ResumeParser().  No changes if the parser was not suspended.
> 
> 
> Potential Issues
> ----------------
> 
> The risk inherent in this API varient is that it does change the
> interpretation of the return code for XML_Parse() and
> XML_ParseBuffer().  This is only significant if any callback ever
> calls XML_SuspendParser().  In the case of suspension,
> XML_STATUS_SUSPENDED would be returned, but an existing main loop will
> recognize this as a successful parse.  This would be a programming
> error in the revised API, but not the old API.  If the buffer being
> parsed was not the last buffer, a reasonable error would be returned
> when the main loop calls XML_Parse() or XML_ParseBuffer() is called
> again, but if the last input buffer was already passed (isFinal is
> true), there would be no opportunity to report the error, possibly
> making it difficult to diagnose application errors introduced by this
> change.
> 
> We don't know how important this change is in practice for Expat
> 1.95.x users; we would appreciate feedback on the expat-discuss list.
> 
> 
> API Option 2
> ------------
> 
> This version of the API changes provide increased backward
> compatibility, at the cost of a cruftier API to Expat.
> 
> An alternate version of the API also adds the XML_SuspendParser() and
> XML_ResumeParser() functions, and the new XML_ERROR_* constants, but
> not the new XML_Status value.  This variant would describe suspension
> as a pseudo-error from the XML_Parse() and XML_ParseBuffer()
> functions, allowing existing applications to report "errors" from the
> main loop if they had not been prepared for the suspension feature,
> but some callback function called XML_SuspendParser().  This would
> only be expected to occur during development, but applications that
> only suspend parsing occaissionally may find that poorly tested code
> paths expose problems late in the development cycle or even after the
> application has entered production.
> 
> The alternate version uses this description for XML_Parse() and
> XML_ParseBuffer():
> 
> XML_Status
> XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)
> 
> XML_Status
> XML_ParseBuffer(XML_Parser parser, int len, int isFinal)
> 
>     If XML_STATUS_ERROR is returned, a main loop which supports the
>     suspension feature needs to check whether XML_GetErrorCode(parser)
>     == XML_ERROR_SUSPENDED.  If so, the parse was suspended and the
>     call to continue the parse needs to be XML_ResumeParser().
>     Otherwise, the error is "real".
> 
> This approach conflates error codes with the state of the parse, and
> labels the normal operation of the parser as an error.
-- 
Mike Olson                                Principal Consultant
mike.olson@fourthought.com                +1 303 583 9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St,                      http://4Suite.org
Boulder, CO 80301-2537, USA
XML strategy, XML tools, knowledge management