[XML-SIG] Proposed Expat API changes
Mike Olson
Mike.Olson@fourthought.com
08 Aug 2002 12:52:14 -0600
On Thu, 2002-08-08 at 12:46, Fred L. Drake, Jr. wrote:
I think option 1 is the best choice. It will not break code unless
someone goes in and adds calls to suspend the parser. As mentioned,
this would break with new return values to XML_Parse, etc. however, if
they are in there making changes might as well change two places.
Mike
>
> I've proposed some changes to Expat's C API on the expat-discuss list;
> these changes would allow pull-based and mixed-mode parsers to be
> built on top of Expat.
>
> Unfortunately, the message hasn't appeared in the online archives;
> this is the cost of using SF's mailing lists. ;-( I've attached the
> proposal to this email, in case anyone is interested. Followups
> pertaining to Expat's C API should be directed to the expat-discuss
> list:
>
> http://sourceforge.net/mail/?group_id=10127
>
>
> -Fred
>
> --
> Fred L. Drake, Jr. <fdrake at acm.org>
> PythonLabs at Zope Corporation
>
> ----
>
> Implementing a blocking mode in Expat
> =====================================
>
> Requests for a pull-based API for Expat have surfaced a few times over
> (at least) the last couple of years; there is a feature request for
> this on SourceForge (issue #544682):
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=544682&group_id=10127&atid=110127
>
> An additional motivation is that we'd like to be able to share a
> codebase with the Mozilla project, which is currently using a
> substantially modified version of an older version of Expat.
>
> Pull-based parsers have become increasingly popular as the limitations
> of DOM- or SAX-like APIs have become better known. The pull-based
> APIs provide an opportunity to build each part of an application in
> the way that's most appropriate, allowing a mixture of DOM- and
> SAX-like behaviors.
>
> Expat could provide the basis for an efficient pull-based API if it
> offered an opportunity to suspend parsing temporarily, allowing
> parsing to resume when the application is ready for additional
> information from the document. A .NET-like API could easily be built
> on top of such a feature.
>
> Karl Waclawek and I have been having discussions about this, and think
> we have a good idea of how to introduce such a feature into Expat.
> There are questions and issues regarding the possible API that would
> need to be exposed; I've summarized our ideas an analysis below in the
> form of two alternate API proposals.
>
> We welcome feedback and discussion, including the introduction of
> additional API proposals, on the expat-discuss list.
>
>
> Supporting Information
> ----------------------
>
> Expat 1.95.6 / 1.96 will include a new enumeration, XML_Status,
> specifying return values for the XML_Parse() and XML_ParseBuffer()
> functions. Our recommendation is that the result of XML_Parse() and
> XML_ParseBuffer() be tested for these values specifically, even when
> using older versions of Expat 1.95.x -- this will be completely
> equivalent in practice. This change allows us to extend the number of
> possible return values in the future; the documented API in Expat 1.95
> through 1.95.4 really only defines a boolean interpretation of these
> return values, but only the two specific values, now named by
> XML_Status enum names, were actually used.
>
>
> API Option 1
> ------------
>
> This alternative introduces two new functions and three new constants.
> These are only needed if an application uses the new functionality.
>
> XML_STATUS_SUSPENDED
>
> New value in the XML_Status enumeration. This is only used if
> XML_SuspendParser() has been called.
>
> XML_ERROR_NOT_SUSPENDED
> XML_ERROR_SUSPENDED
>
> These new error codes would be used to indicate that a call to the
> parser was made when the parser was not in the expected internal
> state, and indicate programming errors in the application.
>
> XML_Status
> XML_SuspendParser(XML_Parser parser)
>
> Inform the parser that parsing should be suspended when the
> currently active callback returns. It should only be called from
> a callback. Returns XML_STATUS_OK or XML_STATUS_ERROR. Multiple
> calls to XML_SuspendParser() during a callback are allowed, and
> are equivalent to a single call to XML_SuspendParser(). It is an
> error to call this function while a callback function is not
> active.
>
> XML_Status
> XML_ResumeParser(XML_Parser parser)
>
> Resume parsing using a suspended parser. Returns XML_STATUS_OK,
> XML_STATUS_ERROR, or XML_STATUS_SUSPENDED. If the parser has not
> been suspended, this returns XML_STATUS_ERROR, and
> XML_GetErrorCode() returns XML_ERROR_NOT_SUSPENDED. The parser is
> not invalidated in this case, and parsing may be continued with
> additional input using XML_Parse() or XML_ParseBuffer().
>
> The following functions change:
>
> XML_Status
> XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)
>
> XML_Status
> XML_ParseBuffer(XML_Parser parser, int len, int isFinal)
>
> These two existing functions will change the meaning of their
> return value slightly. If parsing is suspended using
> XML_SuspendParser(), they will return XML_STATUS_SUSPENDED,
> otherwise the current values of XML_STATUS_OK and XML_STATUS_ERROR
> may be returned.
>
> If XML_STATUS_SUSPENDED is returned, the parse of the input
> document can only be resumed using XML_ResumeParser(). If either
> of these is called on a suspended parser, XML_ERROR_OK will be
> returned with the error code XML_ERROR_SUSPENDED returned by
> XML_GetErrorCode(). The parser is not invalidated in this case,
> and parsing may still be resumed.
>
> void *
> XML_GetBuffer(XML_Parser parser, int len)
>
> If the parser has been suspended, returns NULL and
> XML_GetErrorCode() returns XML_ERROR_SUSPENDED. Parsing the input
> which has already been passed into Expat should be continued using
> XML_ResumeParser(). No changes if the parser was not suspended.
>
>
> Potential Issues
> ----------------
>
> The risk inherent in this API varient is that it does change the
> interpretation of the return code for XML_Parse() and
> XML_ParseBuffer(). This is only significant if any callback ever
> calls XML_SuspendParser(). In the case of suspension,
> XML_STATUS_SUSPENDED would be returned, but an existing main loop will
> recognize this as a successful parse. This would be a programming
> error in the revised API, but not the old API. If the buffer being
> parsed was not the last buffer, a reasonable error would be returned
> when the main loop calls XML_Parse() or XML_ParseBuffer() is called
> again, but if the last input buffer was already passed (isFinal is
> true), there would be no opportunity to report the error, possibly
> making it difficult to diagnose application errors introduced by this
> change.
>
> We don't know how important this change is in practice for Expat
> 1.95.x users; we would appreciate feedback on the expat-discuss list.
>
>
> API Option 2
> ------------
>
> This version of the API changes provide increased backward
> compatibility, at the cost of a cruftier API to Expat.
>
> An alternate version of the API also adds the XML_SuspendParser() and
> XML_ResumeParser() functions, and the new XML_ERROR_* constants, but
> not the new XML_Status value. This variant would describe suspension
> as a pseudo-error from the XML_Parse() and XML_ParseBuffer()
> functions, allowing existing applications to report "errors" from the
> main loop if they had not been prepared for the suspension feature,
> but some callback function called XML_SuspendParser(). This would
> only be expected to occur during development, but applications that
> only suspend parsing occaissionally may find that poorly tested code
> paths expose problems late in the development cycle or even after the
> application has entered production.
>
> The alternate version uses this description for XML_Parse() and
> XML_ParseBuffer():
>
> XML_Status
> XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)
>
> XML_Status
> XML_ParseBuffer(XML_Parser parser, int len, int isFinal)
>
> If XML_STATUS_ERROR is returned, a main loop which supports the
> suspension feature needs to check whether XML_GetErrorCode(parser)
> == XML_ERROR_SUSPENDED. If so, the parse was suspended and the
> call to continue the parse needs to be XML_ResumeParser().
> Otherwise, the error is "real".
>
> This approach conflates error codes with the state of the parse, and
> labels the normal operation of the parser as an error.
--
Mike Olson Principal Consultant
mike.olson@fourthought.com +1 303 583 9900 x 102
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, http://4Suite.org
Boulder, CO 80301-2537, USA
XML strategy, XML tools, knowledge management